Initialize the client with your API key
Performs K-means clustering on vector embeddings to automatically group similar records. This endpoint supports auto-detection of optimal cluster count and AI-generated cluster labels, making it perfect for data analysis, content organization, and pattern discovery.
POST https://api.worqhat.com/db/cluster
What Does This Endpoint Do?
This endpoint uses advanced machine learning clustering algorithms to automatically group your records into meaningful clusters based on their semantic similarity. It can automatically determine the optimal number of clusters and generate AI-powered descriptions for each cluster.
When to Use Clustering
You'll find this endpoint useful when you need to:
- Organize content automatically: Group similar products, articles, or documents
- Discover data patterns: Find natural groupings in your data that weren't obvious before
- Content categorization: Automatically categorize large amounts of content
- Customer segmentation: Group customers by behavior or preferences
- Data exploration: Understand the structure and relationships in your data
- Reduce complexity: Simplify large datasets by grouping similar items
How It Works
- You specify a table to cluster
- The system retrieves vector embeddings for all records
- It performs K-means clustering to group similar records
- Optimal cluster count is automatically determined (or you can specify it)
- AI-generated labels describe what each cluster represents
- Results include cluster information, sample records, and quality metrics
Code Examples
Example 1: Basic Auto-Clustering
This example shows how to automatically cluster products with optimal cluster detection.
cluster-products.js
import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY, // Always use environment variables for API keys});async function clusterProducts() {try {// Call the cluster methodconst response = await client.db.cluster({table: "products",generate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production}); // Handle the successful response console.log(`Clustering complete!`); console.log(`Optimal clusters: ${response.optimal_k}`); console.log(`Silhouette score: ${response.silhouette_score}`); console.log(`Iterations: ${response.iterations}`); // Display each cluster response.clusters.forEach((cluster, index) => { console.log(`Cluster ${index + 1}: ${cluster.label}`); console.log(`Size: ${cluster.size} products`); console.log(`Sample products:`, cluster.sample_records.map(r => r.name)); }); return response;} catch (error) {// Handle any errorsconsole.error('Error clustering products:', error.message);}}// Call the functionclusterProducts();Example 2: Custom Cluster Count
This example shows how to specify a specific number of clusters.
cluster-custom-count.js
import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function clusterWithCustomCount() {try {// Cluster with specific number of clustersconst response = await client.db.cluster({table: "articles",num_clusters: 5, // Specify exact number of clustersgenerate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production}); // Handle the successful response console.log(`Clustering complete with ${response.optimal_k} clusters`); console.log(`Silhouette score: ${response.silhouette_score}`); // Analyze cluster quality response.clusters.forEach((cluster, index) => { console.log(`Cluster ${index + 1}: ${cluster.label}`); console.log(`Size: ${cluster.size} articles`); console.log(`Quality: ${cluster.size > 10 ? 'Good' : 'Small'}`); }); return response;} catch (error) {console.error('Error clustering articles:', error.message);}}clusterWithCustomCount();Example 3: Customer Segmentation
This example shows how to use clustering for customer segmentation analysis.
customer-segmentation.js
import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function segmentCustomers() {try {// Segment customers with auto-detectionconst response = await client.db.cluster({table: "customers",min_clusters: 3, // Minimum clusters for customer segmentsmax_clusters: 8, // Maximum clusters to considergenerate_labels: true,environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production}); // Handle the successful response console.log(`Customer segmentation complete!`); console.log(`Optimal segments: ${response.optimal_k}`); console.log(`Clustering quality: ${response.silhouette_score > 0.7 ? 'Excellent' : 'Good'}`); // Analyze each customer segment response.clusters.forEach((cluster, index) => { console.log(`Segment ${index + 1}: ${cluster.label}`); console.log(`Customer count: ${cluster.size}`); console.log(`Percentage: ${((cluster.size / response.clusters.reduce((sum, c) => sum + c.size, 0)) * 100).toFixed(1)}%`); // Show sample customers console.log(`Sample customers:`, cluster.sample_records.map(c => c.name)); }); return response;} catch (error) {console.error('Error segmenting customers:', error.message);}}segmentCustomers();Request Body Explained
Table to cluster. Example: "products"
Specific number of clusters to create. Range: 2-20. If not provided, optimal number is auto-detected. Example: 5
Minimum clusters for auto-detection. Range: 2-10, default: 2.
Maximum clusters for auto-detection. Range: 2-20, default: 10.
Whether to generate AI labels for clusters. Default: true.
Response Fields Explained
true if clustering was successful, false otherwise.
Array of clusters, each containing: - id: Cluster ID number - label:
AI-generated description of the cluster - size: Number of records in the
cluster - sample_records: 3-5 representative records from the cluster -
centroid: Cluster center embedding (first 10 dimensions)
Determined optimal number of clusters.
Clustering quality metric (0-1). Higher scores indicate better clustering.
Number of K-means iterations performed.
Clustering execution time in milliseconds.
Example Response
response.json
{"success": true,"clusters": [ { "id": 0, "label": "Cluster focused on: electronics, gadgets, technology", "size": 45, "sample_records": [ { "id": "prod_123", "name": "Smart Phone", "category": "electronics", "price": 699.99 }, { "id": "prod_456", "name": "Laptop Computer", "category": "electronics", "price": 1299.99 } ], "centroid": [0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234, 0.567, -0.890, 0.123] }, { "id": 1, "label": "Cluster focused on: clothing, fashion, apparel", "size": 32, "sample_records": [ { "id": "prod_789", "name": "Designer Shirt", "category": "clothing", "price": 89.99 }, { "id": "prod_012", "name": "Fashion Jeans", "category": "clothing", "price": 79.99 } ], "centroid": [-0.234, 0.567, -0.890, 0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234] }],"optimal_k": 5,"silhouette_score": 0.75,"iterations": 12,"executionTime": 2341}Common Errors and How to Fix Them
| Error | Cause | Solution |
|---|---|---|
| "Table not found" | The specified table doesn't exist | Check your table name for typos |
| "No embeddings found" | The table doesn't have vector embeddings | Ensure your table has been processed for embeddings |
| "Insufficient data" | Not enough records for meaningful clustering | Ensure you have at least 20+ records in the table |
| "Invalid cluster range" | Min/max cluster values are invalid | Ensure min_clusters ≤ max_clusters and both are ≥ 2 |
| "Clustering failed" | Algorithm couldn't converge | Try different cluster counts or check data quality |
| "Unauthorized" | Invalid or missing API key | Check that you're using a valid API key |
Tips for Better Clustering
- Start with auto-detection: Let the system find the optimal number of clusters first
- Review silhouette scores: Higher scores (0.7+) indicate good clustering quality
- Examine cluster labels: AI-generated labels help understand what each cluster represents
- Check cluster sizes: Balanced cluster sizes are generally better than very uneven ones
- Use sample records: Review sample records to validate cluster quality
- Iterate and refine: Try different parameters if initial results aren't satisfactory
Clustering Quality Metrics
Silhouette Score Interpretation
- 0.7-1.0: Excellent clustering, well-separated clusters
- 0.5-0.7: Good clustering, reasonably well-separated
- 0.3-0.5: Fair clustering, some overlap between clusters
- 0.0-0.3: Poor clustering, significant overlap
- Negative: Very poor clustering, clusters may be worse than random
Cluster Size Guidelines
- Balanced clusters: Similar sizes are generally better
- Minimum size: Clusters with < 5 records may not be meaningful
- Maximum size: Very large clusters (> 50% of data) may need further subdivision
Use Cases
Content Organization
content-organization.js
// Automatically organize articles by topicconst articleClusters = await client.db.cluster({table: "articles",generate_labels: true});Customer Segmentation
customer-segmentation-use-case.js
// Segment customers by behavior patternsconst customerSegments = await client.db.cluster({table: "customers",min_clusters: 3,max_clusters: 8});Product Categorization
product-categorization.js
// Automatically categorize productsconst productCategories = await client.db.cluster({table: "products",num_clusters: 6,generate_labels: true});Data Exploration
data-exploration.js
// Discover patterns in complex datasetsconst dataPatterns = await client.db.cluster({table: "complex_data",min_clusters: 2,max_clusters: 15});