WorqHat Documentation

Performs K-means clustering on vector embeddings to automatically group similar records. This endpoint supports auto-detection of optimal cluster count and AI-generated cluster labels, making it perfect for data analysis, content organization, and pattern discovery.

POST https://api.worqhat.com/db/cluster

What Does This Endpoint Do?

This endpoint uses advanced machine learning clustering algorithms to automatically group your records into meaningful clusters based on their semantic similarity. It can automatically determine the optimal number of clusters and generate AI-powered descriptions for each cluster.

When to Use Clustering

You'll find this endpoint useful when you need to:

Organize content automatically: Group similar products, articles, or documents
Discover data patterns: Find natural groupings in your data that weren't obvious before
Content categorization: Automatically categorize large amounts of content
Customer segmentation: Group customers by behavior or preferences
Data exploration: Understand the structure and relationships in your data
Reduce complexity: Simplify large datasets by grouping similar items

How It Works

You specify a table to cluster
The system retrieves vector embeddings for all records
It performs K-means clustering to group similar records
Optimal cluster count is automatically determined (or you can specify it)
AI-generated labels describe what each cluster represents
Results include cluster information, sample records, and quality metrics

Code Examples

Example 1: Basic Auto-Clustering

This example shows how to automatically cluster products with optimal cluster detection.

cluster-products.js

import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY, // Always use environment variables for API keys});async function clusterProducts() {try {// Call the cluster methodconst response = await client.db.cluster({table: "products",generate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production});  // Handle the successful response  console.log(`Clustering complete!`);  console.log(`Optimal clusters: ${response.optimal_k}`);  console.log(`Silhouette score: ${response.silhouette_score}`);  console.log(`Iterations: ${response.iterations}`);  // Display each cluster  response.clusters.forEach((cluster, index) => {    console.log(`Cluster ${index + 1}: ${cluster.label}`);    console.log(`Size: ${cluster.size} products`);    console.log(`Sample products:`, cluster.sample_records.map(r => r.name));  });  return response;} catch (error) {// Handle any errorsconsole.error('Error clustering products:', error.message);}}// Call the functionclusterProducts();

Example 2: Custom Cluster Count

This example shows how to specify a specific number of clusters.

cluster-custom-count.js

import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function clusterWithCustomCount() {try {// Cluster with specific number of clustersconst response = await client.db.cluster({table: "articles",num_clusters: 5, // Specify exact number of clustersgenerate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production});  // Handle the successful response  console.log(`Clustering complete with ${response.optimal_k} clusters`);  console.log(`Silhouette score: ${response.silhouette_score}`);  // Analyze cluster quality  response.clusters.forEach((cluster, index) => {    console.log(`Cluster ${index + 1}: ${cluster.label}`);    console.log(`Size: ${cluster.size} articles`);    console.log(`Quality: ${cluster.size > 10 ? 'Good' : 'Small'}`);  });  return response;} catch (error) {console.error('Error clustering articles:', error.message);}}clusterWithCustomCount();

Example 3: Customer Segmentation

This example shows how to use clustering for customer segmentation analysis.

customer-segmentation.js

import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function segmentCustomers() {try {// Segment customers with auto-detectionconst response = await client.db.cluster({table: "customers",min_clusters: 3, // Minimum clusters for customer segmentsmax_clusters: 8, // Maximum clusters to considergenerate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production});  // Handle the successful response  console.log(`Customer segmentation complete!`);  console.log(`Optimal segments: ${response.optimal_k}`);  console.log(`Clustering quality: ${response.silhouette_score > 0.7 ? 'Excellent' : 'Good'}`);  // Analyze each customer segment  response.clusters.forEach((cluster, index) => {    console.log(`Segment ${index + 1}: ${cluster.label}`);    console.log(`Customer count: ${cluster.size}`);    console.log(`Percentage: ${((cluster.size / response.clusters.reduce((sum, c) => sum + c.size, 0)) * 100).toFixed(1)}%`);    // Show sample customers    console.log(`Sample customers:`, cluster.sample_records.map(c => c.name));  });  return response;} catch (error) {console.error('Error segmenting customers:', error.message);}}segmentCustomers();

Request Body Explained

tablestringbodyrequired

Table to cluster. Example: "products"

num_clustersnumberbodyoptional

Specific number of clusters to create. Range: 2-20. If not provided, optimal number is auto-detected. Example: 5

min_clustersnumberbodyoptional

Minimum clusters for auto-detection. Range: 2-10, default: 2.

max_clustersnumberbodyoptional

Maximum clusters for auto-detection. Range: 2-20, default: 10.

generate_labelsbooleanbodyoptional

Whether to generate AI labels for clusters. Default: true.

Response Fields Explained

successbooleanrequired

true if clustering was successful, false otherwise.

clustersarrayrequired

Array of clusters, each containing: - id: Cluster ID number - label: AI-generated description of the cluster - size: Number of records in the cluster - sample_records: 3-5 representative records from the cluster - centroid: Cluster center embedding (first 10 dimensions)

optimal_knumberrequired

Determined optimal number of clusters.

silhouette_scorenumberrequired

Clustering quality metric (0-1). Higher scores indicate better clustering.

iterationsnumberrequired

Number of K-means iterations performed.

executionTimenumberrequired

Clustering execution time in milliseconds.

Example Response

response.json

{"success": true,"clusters": [  {    "id": 0,    "label": "Cluster focused on: electronics, gadgets, technology",    "size": 45,    "sample_records": [      {        "id": "prod_123",        "name": "Smart Phone",        "category": "electronics",        "price": 699.99      },      {        "id": "prod_456",        "name": "Laptop Computer",        "category": "electronics",        "price": 1299.99      }    ],    "centroid": [0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234, 0.567, -0.890, 0.123]  },  {    "id": 1,    "label": "Cluster focused on: clothing, fashion, apparel",    "size": 32,    "sample_records": [      {        "id": "prod_789",        "name": "Designer Shirt",        "category": "clothing",        "price": 89.99      },      {        "id": "prod_012",        "name": "Fashion Jeans",        "category": "clothing",        "price": 79.99      }    ],    "centroid": [-0.234, 0.567, -0.890, 0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234]  }],"optimal_k": 5,"silhouette_score": 0.75,"iterations": 12,"executionTime": 2341}

Common Errors and How to Fix Them

Error	Cause	Solution
"Table not found"	The specified table doesn't exist	Check your table name for typos
"No embeddings found"	The table doesn't have vector embeddings	Ensure your table has been processed for embeddings
"Insufficient data"	Not enough records for meaningful clustering	Ensure you have at least 20+ records in the table
"Invalid cluster range"	Min/max cluster values are invalid	Ensure min_clusters ≤ max_clusters and both are ≥ 2
"Clustering failed"	Algorithm couldn't converge	Try different cluster counts or check data quality
"Unauthorized"	Invalid or missing API key	Check that you're using a valid API key

Tips for Better Clustering

Start with auto-detection: Let the system find the optimal number of clusters first
Review silhouette scores: Higher scores (0.7+) indicate good clustering quality
Examine cluster labels: AI-generated labels help understand what each cluster represents
Check cluster sizes: Balanced cluster sizes are generally better than very uneven ones
Use sample records: Review sample records to validate cluster quality
Iterate and refine: Try different parameters if initial results aren't satisfactory

Clustering Quality Metrics

Silhouette Score Interpretation

0.7-1.0: Excellent clustering, well-separated clusters
0.5-0.7: Good clustering, reasonably well-separated
0.3-0.5: Fair clustering, some overlap between clusters
0.0-0.3: Poor clustering, significant overlap
Negative: Very poor clustering, clusters may be worse than random

Cluster Size Guidelines

Balanced clusters: Similar sizes are generally better
Minimum size: Clusters with < 5 records may not be meaningful
Maximum size: Very large clusters (> 50% of data) may need further subdivision

Use Cases

Content Organization

content-organization.js

// Automatically organize articles by topicconst articleClusters = await client.db.cluster({table: "articles",generate_labels: true});

Customer Segmentation

customer-segmentation-use-case.js

// Segment customers by behavior patternsconst customerSegments = await client.db.cluster({table: "customers",min_clusters: 3,max_clusters: 8});

Product Categorization

product-categorization.js

// Automatically categorize productsconst productCategories = await client.db.cluster({table: "products",num_clusters: 6,generate_labels: true});

Data Exploration

data-exploration.js

// Discover patterns in complex datasetsconst dataPatterns = await client.db.cluster({table: "complex_data",min_clusters: 2,max_clusters: 15});