Initialize the client with your API key
Identifies anomalous or outlier records using K-nearest neighbors analysis on vector embeddings. This endpoint is perfect for fraud detection, data quality checks, and identifying unusual patterns in your data.
POST https://api.worqhat.com/db/detect-anomalies
What Does This Endpoint Do?
This endpoint uses advanced machine learning techniques to identify records that are significantly different from the majority of your data. It analyzes vector embeddings to find outliers that might indicate fraud, errors, or interesting patterns that warrant investigation.
When to Use Anomaly Detection
You'll find this endpoint useful when you need to:
- Fraud detection: Identify suspicious transactions, accounts, or activities
- Data quality assurance: Find records that might contain errors or inconsistencies
- Security monitoring: Detect unusual patterns that could indicate security breaches
- Business intelligence: Discover outliers that might represent new opportunities or risks
- Compliance monitoring: Identify records that don't conform to expected patterns
- Research and analysis: Find interesting edge cases or unusual data points
How It Works
- You specify a table to analyze for anomalies
- The system retrieves vector embeddings for all records in the table
- It performs K-nearest neighbors analysis to calculate anomaly scores
- Records with high anomaly scores are identified as outliers
- Results include nearest neighbors and distance metrics for each anomaly
Code Examples
Example 1: Basic Anomaly Detection
This example shows how to detect anomalies in a transactions table.
detect-anomalies.js
import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY, // Always use environment variables for API keys});async function detectAnomalies() {try { // Call the detectAnomalies method const response = await client.db.detectAnomalies({ table: "transactions", k: 10, threshold: 0.8, limit: 50, environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production }); // Handle the successful response console.log(`Analyzed ${response.total_records} records`); console.log(`Found ${response.anomaly_count} anomalies`); console.log('Anomalies:', response.anomalies); return response;} catch (error) { // Handle any errors console.error('Error detecting anomalies:', error.message);}}// Call the functiondetectAnomalies();Example 2: Fraud Detection with Custom Parameters
This example shows how to tune parameters for fraud detection in financial transactions.
detect-fraud.js
import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function detectFraud() {try { // Detect fraud with stricter parameters const response = await client.db.detectAnomalies({ table: "financial_transactions", k: 15, // More neighbors for better accuracy threshold: 0.9, // Higher threshold for stricter detection limit: 100, // More results to investigate environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production }); // Handle the successful response console.log(`Fraud detection analysis complete`); console.log(`Total transactions analyzed: ${response.total_records}`); console.log(`Potential fraud cases: ${response.anomaly_count}`); // Process each anomaly response.anomalies.forEach((anomaly, index) => { console.log(`\nAnomaly ${index + 1}:`); console.log(`- Transaction ID: ${anomaly.record.transaction_id}`); console.log(`- Amount: $${anomaly.record.amount}`); console.log(`- Anomaly Score: ${anomaly.anomaly_score}`); console.log(`- Average Distance: ${anomaly.avg_distance}`); }); return response;} catch (error) { console.error('Error detecting fraud:', error.message);}}detectFraud();Example 3: Data Quality Check
This example shows how to use anomaly detection for data quality assurance.
data-quality.js
import Worqhat from 'worqhat';// Initialize the client with your API keyconst client = new Worqhat({apiKey: process.env.WORQHAT_API_KEY,});async function checkDataQuality() {try { // Check data quality with moderate parameters const response = await client.db.detectAnomalies({ table: "user_profiles", k: 8, // Moderate number of neighbors threshold: 0.7, // Moderate threshold for data quality limit: 30, // Reasonable number of results environment: process.env.WORQHAT_ENVIRONMENT || 'production' // Defaults to production }); // Handle the successful response console.log(`Data quality check complete`); console.log(`Total profiles analyzed: ${response.total_records}`); console.log(`Potential data quality issues: ${response.anomaly_count}`); // Analyze anomalies for data quality issues response.anomalies.forEach((anomaly, index) => { console.log(`\nData Quality Issue ${index + 1}:`); console.log(`- User ID: ${anomaly.record.user_id}`); console.log(`- Profile: ${JSON.stringify(anomaly.record, null, 2)}`); console.log(`- Anomaly Score: ${anomaly.anomaly_score}`); console.log(`- Nearest Neighbors: ${anomaly.nearest_neighbors.length}`); }); return response;} catch (error) { console.error('Error checking data quality:', error.message);}}checkDataQuality();Request Body Explained
Table to analyze for anomalies.
Example: "transactions"
Number of nearest neighbors to consider for anomaly detection. Range: 1-50, default: 10. Higher values provide more stable results but may miss subtle anomalies.
Minimum anomaly score threshold (0-1). Only records above this threshold are considered anomalies. Default: 0.8. Higher values detect fewer, more extreme anomalies.
Maximum number of anomalies to return. Range: 1-100, default: 50.
Response Fields Explained
true if anomaly detection was successful, false otherwise.
Array of detected anomalies, each containing:
record: The anomalous record dataanomaly_score: Anomaly score (0-1, higher = more unusual)avg_distance: Average distance to K nearest neighborsnearest_neighbors: Array of nearest neighbor records with distances
Total number of records analyzed.
Number of anomalies detected.
Parameters used for the analysis (k, threshold).
Analysis execution time in milliseconds.
Example Response
response.json
{"success": true,"anomalies": [ { "record": { "id": "txn_123", "amount": 50000.00, "currency": "USD", "merchant": "Unknown", "location": "Remote" }, "anomaly_score": 0.923, "avg_distance": 0.456, "nearest_neighbors": [ { "record": { "id": "txn_456", "amount": 25.50, "currency": "USD", "merchant": "Coffee Shop", "location": "Local" }, "distance": 0.234 }, { "record": { "id": "txn_789", "amount": 15.75, "currency": "USD", "merchant": "Restaurant", "location": "Local" }, "distance": 0.312 } ] }],"total_records": 1000,"anomaly_count": 15,"parameters": { "k": 10, "threshold": 0.8},"executionTime": 1247}Common Errors and How to Fix Them
| Error | Cause | Solution |
|---|---|---|
| "Table not found" | The specified table doesn't exist | Check your table name for typos |
| "No embeddings found" | The table doesn't have vector embeddings | Ensure your table has been processed for embeddings |
| "Insufficient data" | Not enough records for meaningful analysis | Ensure you have at least 20+ records in the table |
| "K value too large" | K parameter exceeds available data | Reduce K value or ensure you have more records |
| "Threshold too high" | No records meet the anomaly threshold | Lower the threshold value (try 0.6-0.7) |
| "Unauthorized" | Invalid or missing API key | Check that you're using a valid API key |
Tips for Better Anomaly Detection
- Start with moderate parameters: Begin with k=10, threshold=0.8 and adjust based on results
- Consider your use case:
- Fraud detection: Higher threshold (0.9+), more neighbors (k=15+)
- Data quality: Moderate threshold (0.7), moderate neighbors (k=8-10)
- General analysis: Lower threshold (0.6), fewer neighbors (k=5-8)
- Review nearest neighbors: Examine what "normal" records look like to understand anomalies
- Monitor execution time: Larger datasets take longer to analyze
- Validate results: Manually review detected anomalies to ensure they make sense
- Iterate and refine: Adjust parameters based on the quality of detected anomalies
Parameter Tuning Guidelines
K (Number of Neighbors)
- Low K (5-8): More sensitive to local variations, may detect subtle anomalies
- Medium K (10-15): Balanced approach, good for most use cases
- High K (20+): More stable, focuses on major outliers, less sensitive to noise
Threshold (Anomaly Score)
- Low Threshold (0.6-0.7): Detects more anomalies, including minor outliers
- Medium Threshold (0.8): Balanced detection, good default
- High Threshold (0.9+): Only detects major anomalies, very strict
Use Cases
Fraud Detection
// Detect suspicious financial transactions const fraudAnomalies = await client.db.detectAnomalies({ table: "transactions", k: 15, threshold: 0.9, limit: 100 });
Data Quality Assurance
// Find records with potential data quality issues const qualityIssues = await client.db.detectAnomalies({ table: "user_profiles", k: 8, threshold: 0.7, limit: 50 });
Security Monitoring
// Detect unusual user behavior patterns const securityAnomalies = await client.db.detectAnomalies({ table: "user_activities", k: 12, threshold: 0.85, limit: 30 });
Business Intelligence
// Find interesting outliers in business data const businessAnomalies = await client.db.detectAnomalies({ table: "sales_data", k: 10, threshold: 0.75, limit: 20 });
