Performs K-means clustering on vector embeddings to automatically group similar records. This endpoint supports auto-detection of optimal cluster count and AI-generated cluster labels, making it perfect for data analysis, content organization, and pattern discovery.
POST https://api.worqhat.com/db/cluster
What Does This Endpoint Do?
This endpoint uses advanced machine learning clustering algorithms to automatically group your records into meaningful clusters based on their semantic similarity. It can automatically determine the optimal number of clusters and generate AI-powered descriptions for each cluster.
When to Use Clustering
You’ll find this endpoint useful when you need to:
- Organize content automatically: Group similar products, articles, or documents
- Discover data patterns: Find natural groupings in your data that weren’t obvious before
- Content categorization: Automatically categorize large amounts of content
- Customer segmentation: Group customers by behavior or preferences
- Data exploration: Understand the structure and relationships in your data
- Reduce complexity: Simplify large datasets by grouping similar items
How It Works
- You specify a table to cluster
- The system retrieves vector embeddings for all records
- It performs K-means clustering to group similar records
- Optimal cluster count is automatically determined (or you can specify it)
- AI-generated labels describe what each cluster represents
- Results include cluster information, sample records, and quality metrics
Code Examples
Example 1: Basic Auto-Clustering
This example shows how to automatically cluster products with optimal cluster detection.
import Worqhat from 'worqhat';
// Initialize the client with your API key
const client = new Worqhat({
apiKey: process.env.WORQHAT_API_KEY, // Always use environment variables for API keys
});
async function clusterProducts() {
try {
// Call the cluster method
const response = await client.db.cluster({
table: "products",
generate_labels: true,
environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production
});
// Handle the successful response
console.log(`Clustering complete!`);
console.log(`Optimal clusters: ${response.optimal_k}`);
console.log(`Silhouette score: ${response.silhouette_score}`);
console.log(`Iterations: ${response.iterations}`);
// Display each cluster
response.clusters.forEach((cluster, index) => {
console.log(`\nCluster ${index + 1}: ${cluster.label}`);
console.log(`Size: ${cluster.size} products`);
console.log(`Sample products:`, cluster.sample_records.map(r => r.name));
});
return response;
} catch (error) {
// Handle any errors
console.error('Error clustering products:', error.message);
}
}
// Call the function
clusterProducts();
Example 2: Custom Cluster Count
This example shows how to specify a specific number of clusters.
import Worqhat from 'worqhat';
// Initialize the client with your API key
const client = new Worqhat({
apiKey: process.env.WORQHAT_API_KEY,
});
async function clusterWithCustomCount() {
try {
// Cluster with specific number of clusters
const response = await client.db.cluster({
table: "articles",
num_clusters: 5, // Specify exact number of clusters
generate_labels: true,
environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production
});
// Handle the successful response
console.log(`Clustering complete with ${response.optimal_k} clusters`);
console.log(`Silhouette score: ${response.silhouette_score}`);
// Analyze cluster quality
response.clusters.forEach((cluster, index) => {
console.log(`\nCluster ${index + 1}: ${cluster.label}`);
console.log(`Size: ${cluster.size} articles`);
console.log(`Quality: ${cluster.size > 10 ? 'Good' : 'Small'}`);
});
return response;
} catch (error) {
console.error('Error clustering articles:', error.message);
}
}
clusterWithCustomCount();
Example 3: Customer Segmentation
This example shows how to use clustering for customer segmentation analysis.
import Worqhat from 'worqhat';
// Initialize the client with your API key
const client = new Worqhat({
apiKey: process.env.WORQHAT_API_KEY,
});
async function segmentCustomers() {
try {
// Segment customers with auto-detection
const response = await client.db.cluster({
table: "customers",
min_clusters: 3, // Minimum clusters for customer segments
max_clusters: 8, // Maximum clusters to consider
generate_labels: true,
environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production
});
// Handle the successful response
console.log(`Customer segmentation complete!`);
console.log(`Optimal segments: ${response.optimal_k}`);
console.log(`Clustering quality: ${response.silhouette_score > 0.7 ? 'Excellent' : 'Good'}`);
// Analyze each customer segment
response.clusters.forEach((cluster, index) => {
console.log(`\nSegment ${index + 1}: ${cluster.label}`);
console.log(`Customer count: ${cluster.size}`);
console.log(`Percentage: ${((cluster.size / response.clusters.reduce((sum, c) => sum + c.size, 0)) * 100).toFixed(1)}%`);
// Show sample customers
console.log(`Sample customers:`, cluster.sample_records.map(c => c.name));
});
return response;
} catch (error) {
console.error('Error segmenting customers:', error.message);
}
}
segmentCustomers();
Request Body Explained
Table to cluster.Example: “products”
Specific number of clusters to create. Range: 2-20. If not provided, optimal number is auto-detected.Example: 5
Minimum clusters for auto-detection. Range: 2-10, default: 2.
Maximum clusters for auto-detection. Range: 2-20, default: 10.
Whether to generate AI labels for clusters. Default: true.
Response Fields Explained
true if clustering was successful, false otherwise.
Array of clusters, each containing:
id: Cluster ID number
label: AI-generated description of the cluster
size: Number of records in the cluster
sample_records: 3-5 representative records from the cluster
centroid: Cluster center embedding (first 10 dimensions)
Determined optimal number of clusters.
Clustering quality metric (0-1). Higher scores indicate better clustering.
Number of K-means iterations performed.
Clustering execution time in milliseconds.
Example Response
{
"success": true,
"clusters": [
{
"id": 0,
"label": "Cluster focused on: electronics, gadgets, technology",
"size": 45,
"sample_records": [
{
"id": "prod_123",
"name": "Smart Phone",
"category": "electronics",
"price": 699.99
},
{
"id": "prod_456",
"name": "Laptop Computer",
"category": "electronics",
"price": 1299.99
}
],
"centroid": [0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234, 0.567, -0.890, 0.123]
},
{
"id": 1,
"label": "Cluster focused on: clothing, fashion, apparel",
"size": 32,
"sample_records": [
{
"id": "prod_789",
"name": "Designer Shirt",
"category": "clothing",
"price": 89.99
},
{
"id": "prod_012",
"name": "Fashion Jeans",
"category": "clothing",
"price": 79.99
}
],
"centroid": [-0.234, 0.567, -0.890, 0.123, -0.456, 0.789, -0.321, 0.654, 0.987, -0.234]
}
],
"optimal_k": 5,
"silhouette_score": 0.75,
"iterations": 12,
"executionTime": 2341
}
Common Errors and How to Fix Them
| Error | Cause | Solution |
|---|
| ”Table not found” | The specified table doesn’t exist | Check your table name for typos |
| ”No embeddings found” | The table doesn’t have vector embeddings | Ensure your table has been processed for embeddings |
| ”Insufficient data” | Not enough records for meaningful clustering | Ensure you have at least 20+ records in the table |
| ”Invalid cluster range” | Min/max cluster values are invalid | Ensure min_clusters ≤ max_clusters and both are ≥ 2 |
| ”Clustering failed” | Algorithm couldn’t converge | Try different cluster counts or check data quality |
| ”Unauthorized” | Invalid or missing API key | Check that you’re using a valid API key |
Tips for Better Clustering
- Start with auto-detection: Let the system find the optimal number of clusters first
- Review silhouette scores: Higher scores (0.7+) indicate good clustering quality
- Examine cluster labels: AI-generated labels help understand what each cluster represents
- Check cluster sizes: Balanced cluster sizes are generally better than very uneven ones
- Use sample records: Review sample records to validate cluster quality
- Iterate and refine: Try different parameters if initial results aren’t satisfactory
Clustering Quality Metrics
Silhouette Score Interpretation
- 0.7-1.0: Excellent clustering, well-separated clusters
- 0.5-0.7: Good clustering, reasonably well-separated
- 0.3-0.5: Fair clustering, some overlap between clusters
- 0.0-0.3: Poor clustering, significant overlap
- Negative: Very poor clustering, clusters may be worse than random
Cluster Size Guidelines
- Balanced clusters: Similar sizes are generally better
- Minimum size: Clusters with < 5 records may not be meaningful
- Maximum size: Very large clusters (> 50% of data) may need further subdivision
Use Cases
Content Organization
// Automatically organize articles by topic
const articleClusters = await client.db.cluster({
table: "articles",
generate_labels: true
});
Customer Segmentation
// Segment customers by behavior patterns
const customerSegments = await client.db.cluster({
table: "customers",
min_clusters: 3,
max_clusters: 8
});
Product Categorization
// Automatically categorize products
const productCategories = await client.db.cluster({
table: "products",
num_clusters: 6,
generate_labels: true
});
Data Exploration
// Discover patterns in complex datasets
const dataPatterns = await client.db.cluster({
table: "complex_data",
min_clusters: 2,
max_clusters: 15
});