Skip to main content
Performs K-means clustering on vector embeddings to automatically group similar records. This endpoint supports auto-detection of optimal cluster count and AI-generated cluster labels, making it perfect for data analysis, content organization, and pattern discovery.
POST https://api.worqhat.com/db/cluster

What Does This Endpoint Do?

This endpoint uses advanced machine learning clustering algorithms to automatically group your records into meaningful clusters based on their semantic similarity. It can automatically determine the optimal number of clusters and generate AI-powered descriptions for each cluster.

When to Use Clustering

You’ll find this endpoint useful when you need to:
  • Organize content automatically: Group similar products, articles, or documents
  • Discover data patterns: Find natural groupings in your data that weren’t obvious before
  • Content categorization: Automatically categorize large amounts of content
  • Customer segmentation: Group customers by behavior or preferences
  • Data exploration: Understand the structure and relationships in your data
  • Reduce complexity: Simplify large datasets by grouping similar items

How It Works

  1. You specify a table to cluster
  2. The system retrieves vector embeddings for all records
  3. It performs K-means clustering to group similar records
  4. Optimal cluster count is automatically determined (or you can specify it)
  5. AI-generated labels describe what each cluster represents
  6. Results include cluster information, sample records, and quality metrics

Code Examples

Example 1: Basic Auto-Clustering

This example shows how to automatically cluster products with optimal cluster detection.
  • Node.js
  • Python
  • Go
  • cURL
import Worqhat from 'worqhat';

// Initialize the client with your API key
const client = new Worqhat({
  apiKey: process.env.WORQHAT_API_KEY, // Always use environment variables for API keys
});

async function clusterProducts() {
  try {
    // Call the cluster method
    const response = await client.db.cluster({
      table: "products",
      generate_labels: true,
      environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production
    });
    
    // Handle the successful response
    console.log(`Clustering complete!`);
    console.log(`Optimal clusters: ${response.optimal_k}`);
    console.log(`Silhouette score: ${response.silhouette_score}`);
    console.log(`Iterations: ${response.iterations}`);
    
    // Display each cluster
    response.clusters.forEach((cluster, index) => {
      console.log(`\nCluster ${index + 1}: ${cluster.label}`);
      console.log(`Size: ${cluster.size} products`);
      console.log(`Sample products:`, cluster.sample_records.map(r => r.name));
    });
    return response;
  } catch (error) {
    // Handle any errors
    console.error('Error clustering products:', error.message);
  }
}

// Call the function
clusterProducts();

Example 2: Custom Cluster Count

This example shows how to specify a specific number of clusters.
  • Node.js
  • Python
  • Go
  • cURL
import Worqhat from 'worqhat';

// Initialize the client with your API key
const client = new Worqhat({
  apiKey: process.env.WORQHAT_API_KEY,
});

async function clusterWithCustomCount() {
  try {
    // Cluster with specific number of clusters
    const response = await client.db.cluster({
      table: "articles",
      num_clusters: 5,        // Specify exact number of clusters
      generate_labels: true,
      environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production
    });
    
    // Handle the successful response
    console.log(`Clustering complete with ${response.optimal_k} clusters`);
    console.log(`Silhouette score: ${response.silhouette_score}`);
    
    // Analyze cluster quality
    response.clusters.forEach((cluster, index) => {
      console.log(`\nCluster ${index + 1}: ${cluster.label}`);
      console.log(`Size: ${cluster.size} articles`);
      console.log(`Quality: ${cluster.size > 10 ? 'Good' : 'Small'}`);
    });
    return response;
  } catch (error) {
    console.error('Error clustering articles:', error.message);
  }
}

clusterWithCustomCount();

Example 3: Customer Segmentation

This example shows how to use clustering for customer segmentation analysis.
  • Node.js
  • Python
  • Go
  • cURL
import Worqhat from 'worqhat';

// Initialize the client with your API key
const client = new Worqhat({
  apiKey: process.env.WORQHAT_API_KEY,
});

async function segmentCustomers() {
  try {
    // Segment customers with auto-detection
    const response = await client.db.cluster({
      table: "customers",
      min_clusters: 3,       // Minimum clusters for customer segments
      max_clusters: 8,      // Maximum clusters to consider
      generate_labels: true,
      environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production
    });
    
    // Handle the successful response
    console.log(`Customer segmentation complete!`);
    console.log(`Optimal segments: ${response.optimal_k}`);
    console.log(`Clustering quality: ${response.silhouette_score > 0.7 ? 'Excellent' : 'Good'}`);
    
    // Analyze each customer segment
    response.clusters.forEach((cluster, index) => {
      console.log(`\nSegment ${index + 1}: ${cluster.label}`);
      console.log(`Customer count: ${cluster.size}`);
      console.log(`Percentage: ${((cluster.size / response.clusters.reduce((sum, c) => sum + c.size, 0)) * 100).toFixed(1)}%`);
      
      // Show sample customers
      console.log(`Sample customers:`, cluster.sample_records.map(c => c.name));
    });
    return response;
  } catch (error) {
    console.error('Error segmenting customers:', error.message);
  }
}

segmentCustomers();

Request Body Explained

table
string
required
Table to cluster.Example: “products”
num_clusters
number
Specific number of clusters to create. Range: 2-20. If not provided, optimal number is auto-detected.Example: 5
min_clusters
number
Minimum clusters for auto-detection. Range: 2-10, default: 2.
max_clusters
number
Maximum clusters for auto-detection. Range: 2-20, default: 10.
generate_labels
boolean
Whether to generate AI labels for clusters. Default: true.

Response Fields Explained

success
boolean
true if clustering was successful, false otherwise.
clusters
array
Array of clusters, each containing:
  • id: Cluster ID number
  • label: AI-generated description of the cluster
  • size: Number of records in the cluster
  • sample_records: 3-5 representative records from the cluster
  • centroid: Cluster center embedding (first 10 dimensions)
optimal_k
number
Determined optimal number of clusters.
silhouette_score
number
Clustering quality metric (0-1). Higher scores indicate better clustering.
iterations
number
Number of K-means iterations performed.
executionTime
number
Clustering execution time in milliseconds.

Example Response

{
  "success": true,
  "clusters": [
    {
      "id": 0,
      "label": "Cluster focused on: electronics, gadgets, technology",
      "size": 45,
      "sample_records": [
        {
          "id": "prod_123",
          "name": "Smart Phone",
          "category": "electronics",
          "price": 699.99
        },
        {
          "id": "prod_456",
          "name": "Laptop Computer",
          "category": "electronics",
          "price": 1299.99
        }
      ],
      "centroid": [0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234, 0.567, -0.890, 0.123]
    },
    {
      "id": 1,
      "label": "Cluster focused on: clothing, fashion, apparel",
      "size": 32,
      "sample_records": [
        {
          "id": "prod_789",
          "name": "Designer Shirt",
          "category": "clothing",
          "price": 89.99
        },
        {
          "id": "prod_012",
          "name": "Fashion Jeans",
          "category": "clothing",
          "price": 79.99
        }
      ],
      "centroid": [-0.234, 0.567, -0.890, 0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234]
    }
  ],
  "optimal_k": 5,
  "silhouette_score": 0.75,
  "iterations": 12,
  "executionTime": 2341
}

Common Errors and How to Fix Them

ErrorCauseSolution
”Table not found”The specified table doesn’t existCheck your table name for typos
”No embeddings found”The table doesn’t have vector embeddingsEnsure your table has been processed for embeddings
”Insufficient data”Not enough records for meaningful clusteringEnsure you have at least 20+ records in the table
”Invalid cluster range”Min/max cluster values are invalidEnsure min_clusters ≤ max_clusters and both are ≥ 2
”Clustering failed”Algorithm couldn’t convergeTry different cluster counts or check data quality
”Unauthorized”Invalid or missing API keyCheck that you’re using a valid API key

Tips for Better Clustering

  • Start with auto-detection: Let the system find the optimal number of clusters first
  • Review silhouette scores: Higher scores (0.7+) indicate good clustering quality
  • Examine cluster labels: AI-generated labels help understand what each cluster represents
  • Check cluster sizes: Balanced cluster sizes are generally better than very uneven ones
  • Use sample records: Review sample records to validate cluster quality
  • Iterate and refine: Try different parameters if initial results aren’t satisfactory

Clustering Quality Metrics

Silhouette Score Interpretation

  • 0.7-1.0: Excellent clustering, well-separated clusters
  • 0.5-0.7: Good clustering, reasonably well-separated
  • 0.3-0.5: Fair clustering, some overlap between clusters
  • 0.0-0.3: Poor clustering, significant overlap
  • Negative: Very poor clustering, clusters may be worse than random

Cluster Size Guidelines

  • Balanced clusters: Similar sizes are generally better
  • Minimum size: Clusters with < 5 records may not be meaningful
  • Maximum size: Very large clusters (> 50% of data) may need further subdivision

Use Cases

Content Organization

// Automatically organize articles by topic
const articleClusters = await client.db.cluster({
  table: "articles",
  generate_labels: true
});

Customer Segmentation

// Segment customers by behavior patterns
const customerSegments = await client.db.cluster({
  table: "customers",
  min_clusters: 3,
  max_clusters: 8
});

Product Categorization

// Automatically categorize products
const productCategories = await client.db.cluster({
  table: "products",
  num_clusters: 6,
  generate_labels: true
});

Data Exploration

// Discover patterns in complex datasets
const dataPatterns = await client.db.cluster({
  table: "complex_data",
  min_clusters: 2,
  max_clusters: 15
});