Database

Initialize the client with your API key

Performs K-means clustering on vector embeddings to automatically group similar records. This endpoint supports auto-detection of optimal cluster count and AI-generated cluster labels, making it perfect for data analysis, content organization, and pattern discovery.

POST https://api.worqhat.com/db/cluster

What Does This Endpoint Do?

This endpoint uses advanced machine learning clustering algorithms to automatically group your records into meaningful clusters based on their semantic similarity. It can automatically determine the optimal number of clusters and generate AI-powered descriptions for each cluster.

When to Use Clustering

You'll find this endpoint useful when you need to:

  • Organize content automatically: Group similar products, articles, or documents
  • Discover data patterns: Find natural groupings in your data that weren't obvious before
  • Content categorization: Automatically categorize large amounts of content
  • Customer segmentation: Group customers by behavior or preferences
  • Data exploration: Understand the structure and relationships in your data
  • Reduce complexity: Simplify large datasets by grouping similar items

How It Works

  1. You specify a table to cluster
  2. The system retrieves vector embeddings for all records
  3. It performs K-means clustering to group similar records
  4. Optimal cluster count is automatically determined (or you can specify it)
  5. AI-generated labels describe what each cluster represents
  6. Results include cluster information, sample records, and quality metrics

Code Examples

Example 1: Basic Auto-Clustering

This example shows how to automatically cluster products with optimal cluster detection.

JavaScript

cluster-products.js

import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY, // Always use environment variables for API keys});async function clusterProducts() {try {// Call the cluster methodconst response = await client.db.cluster({table: "products",generate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production});  // Handle the successful response  console.log(`Clustering complete!`);  console.log(`Optimal clusters: ${response.optimal_k}`);  console.log(`Silhouette score: ${response.silhouette_score}`);  console.log(`Iterations: ${response.iterations}`);  // Display each cluster  response.clusters.forEach((cluster, index) => {    console.log(`Cluster ${index + 1}: ${cluster.label}`);    console.log(`Size: ${cluster.size} products`);    console.log(`Sample products:`, cluster.sample_records.map(r => r.name));  });  return response;} catch (error) {// Handle any errorsconsole.error('Error clustering products:', error.message);}}// Call the functionclusterProducts();

Example 2: Custom Cluster Count

This example shows how to specify a specific number of clusters.

JavaScript

cluster-custom-count.js

import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function clusterWithCustomCount() {try {// Cluster with specific number of clustersconst response = await client.db.cluster({table: "articles",num_clusters: 5, // Specify exact number of clustersgenerate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production});  // Handle the successful response  console.log(`Clustering complete with ${response.optimal_k} clusters`);  console.log(`Silhouette score: ${response.silhouette_score}`);  // Analyze cluster quality  response.clusters.forEach((cluster, index) => {    console.log(`Cluster ${index + 1}: ${cluster.label}`);    console.log(`Size: ${cluster.size} articles`);    console.log(`Quality: ${cluster.size > 10 ? 'Good' : 'Small'}`);  });  return response;} catch (error) {console.error('Error clustering articles:', error.message);}}clusterWithCustomCount();

Example 3: Customer Segmentation

This example shows how to use clustering for customer segmentation analysis.

JavaScript

customer-segmentation.js

import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function segmentCustomers() {try {// Segment customers with auto-detectionconst response = await client.db.cluster({table: "customers",min_clusters: 3, // Minimum clusters for customer segmentsmax_clusters: 8, // Maximum clusters to considergenerate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production});  // Handle the successful response  console.log(`Customer segmentation complete!`);  console.log(`Optimal segments: ${response.optimal_k}`);  console.log(`Clustering quality: ${response.silhouette_score > 0.7 ? 'Excellent' : 'Good'}`);  // Analyze each customer segment  response.clusters.forEach((cluster, index) => {    console.log(`Segment ${index + 1}: ${cluster.label}`);    console.log(`Customer count: ${cluster.size}`);    console.log(`Percentage: ${((cluster.size / response.clusters.reduce((sum, c) => sum + c.size, 0)) * 100).toFixed(1)}%`);    // Show sample customers    console.log(`Sample customers:`, cluster.sample_records.map(c => c.name));  });  return response;} catch (error) {console.error('Error segmenting customers:', error.message);}}segmentCustomers();

Request Body Explained

tablestringbodyrequired

Table to cluster. Example: "products"

num_clustersnumberbodyoptional

Specific number of clusters to create. Range: 2-20. If not provided, optimal number is auto-detected. Example: 5

min_clustersnumberbodyoptional

Minimum clusters for auto-detection. Range: 2-10, default: 2.

max_clustersnumberbodyoptional

Maximum clusters for auto-detection. Range: 2-20, default: 10.

generate_labelsbooleanbodyoptional

Whether to generate AI labels for clusters. Default: true.

Response Fields Explained

successbooleanrequired

true if clustering was successful, false otherwise.

clustersarrayrequired

Array of clusters, each containing: - id: Cluster ID number - label: AI-generated description of the cluster - size: Number of records in the cluster - sample_records: 3-5 representative records from the cluster - centroid: Cluster center embedding (first 10 dimensions)

optimal_knumberrequired

Determined optimal number of clusters.

silhouette_scorenumberrequired

Clustering quality metric (0-1). Higher scores indicate better clustering.

iterationsnumberrequired

Number of K-means iterations performed.

executionTimenumberrequired

Clustering execution time in milliseconds.

Example Response

response.json

{"success": true,"clusters": [  {    "id": 0,    "label": "Cluster focused on: electronics, gadgets, technology",    "size": 45,    "sample_records": [      {        "id": "prod_123",        "name": "Smart Phone",        "category": "electronics",        "price": 699.99      },      {        "id": "prod_456",        "name": "Laptop Computer",        "category": "electronics",        "price": 1299.99      }    ],    "centroid": [0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234, 0.567, -0.890, 0.123]  },  {    "id": 1,    "label": "Cluster focused on: clothing, fashion, apparel",    "size": 32,    "sample_records": [      {        "id": "prod_789",        "name": "Designer Shirt",        "category": "clothing",        "price": 89.99      },      {        "id": "prod_012",        "name": "Fashion Jeans",        "category": "clothing",        "price": 79.99      }    ],    "centroid": [-0.234, 0.567, -0.890, 0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234]  }],"optimal_k": 5,"silhouette_score": 0.75,"iterations": 12,"executionTime": 2341}

Common Errors and How to Fix Them

ErrorCauseSolution
"Table not found"The specified table doesn't existCheck your table name for typos
"No embeddings found"The table doesn't have vector embeddingsEnsure your table has been processed for embeddings
"Insufficient data"Not enough records for meaningful clusteringEnsure you have at least 20+ records in the table
"Invalid cluster range"Min/max cluster values are invalidEnsure min_clusters ≤ max_clusters and both are ≥ 2
"Clustering failed"Algorithm couldn't convergeTry different cluster counts or check data quality
"Unauthorized"Invalid or missing API keyCheck that you're using a valid API key

Tips for Better Clustering

  • Start with auto-detection: Let the system find the optimal number of clusters first
  • Review silhouette scores: Higher scores (0.7+) indicate good clustering quality
  • Examine cluster labels: AI-generated labels help understand what each cluster represents
  • Check cluster sizes: Balanced cluster sizes are generally better than very uneven ones
  • Use sample records: Review sample records to validate cluster quality
  • Iterate and refine: Try different parameters if initial results aren't satisfactory

Clustering Quality Metrics

Silhouette Score Interpretation

  • 0.7-1.0: Excellent clustering, well-separated clusters
  • 0.5-0.7: Good clustering, reasonably well-separated
  • 0.3-0.5: Fair clustering, some overlap between clusters
  • 0.0-0.3: Poor clustering, significant overlap
  • Negative: Very poor clustering, clusters may be worse than random

Cluster Size Guidelines

  • Balanced clusters: Similar sizes are generally better
  • Minimum size: Clusters with < 5 records may not be meaningful
  • Maximum size: Very large clusters (> 50% of data) may need further subdivision

Use Cases

Content Organization

JavaScript

content-organization.js

// Automatically organize articles by topicconst articleClusters = await client.db.cluster({table: "articles",generate_labels: true});

Customer Segmentation

JavaScript

customer-segmentation-use-case.js

// Segment customers by behavior patternsconst customerSegments = await client.db.cluster({table: "customers",min_clusters: 3,max_clusters: 8});

Product Categorization

JavaScript

product-categorization.js

// Automatically categorize productsconst productCategories = await client.db.cluster({table: "products",num_clusters: 6,generate_labels: true});

Data Exploration

JavaScript

data-exploration.js

// Discover patterns in complex datasetsconst dataPatterns = await client.db.cluster({table: "complex_data",min_clusters: 2,max_clusters: 15});