The Problem
I started exporting current attributes and discovered they were not clustered into groups at all — essentially a massive pile of raw data. The only known information was a column with attribute families.
I tried using formulas and scripts for initial clustering to then filter by categories and remove unnecessary ones — it did not work.
At a certain point, time started running out. I decided to take a risk and connected AI to Google Sheets via API. Yes, data scientists will judge me for this — my apologies.
The Idea
What if I take not one LLM, but two, create 2 database copies, and have one LLM process data from the beginning while another processes from the end? Then elegantly merge everything and write a script to check for discrepancies after AI processing.
Spoiler: it worked!
After preliminary tests, two candidates were selected:
- Claude Sonnet 4.5
- Best for complex reasoning tasks
- Higher cost but more reliable output
- DeepSeek V3.2-Exp (Thinking Mode)
- Excellent throughput and rate limits
- Cost-effective for bulk processing
The processing pipeline followed these steps:
- Export raw attribute data from the database
- Split dataset into two halves
- First half: rows 1–2624 for Claude
- Second half: rows 2625–5247 for DeepSeek
- Configure API parameters and rate limiting
- Run parallel processing overnight
- Merge results and validate discrepancies
The Hybrid Approach
The beauty was that Claude processed attributes from 1 to 5k, while DeepSeek worked backwards from 5k to 1, meeting Claude halfway. This is what allowed solving the task under time crisis conditions.
Implementation
What Went Wrong
Clustering consumed enormous amounts of tokens, burning through my funds.
The initial data was too sparse for clustering. Besides my three-level category tree and attribute family, there was nothing else.
DeepSeek and Claude frequently hallucinated, not outputting exact category values. Setting temperature to 0.1 helped, but processing speed dropped significantly.
Increasing batch size quickly hit Claude's rate limits (DeepSeek handled it better). The error was incorrect API-Delay configuration.
The Pivot
Had to quickly rethink the solution method. Decision: outsource attributes without clustering. LLMs would only analyze a few data columns and send a final score on a 5-point scale with minimal comments.
Final Results
Here are the final parameters that made it work:
// Processing parameters
const CONFIG = {
BATCH_SIZE: 30,
API_DELAY_CLAUDE: 800, // ms
API_DELAY_DEEPSEEK: 500,
MAX_RETRIES: 3,
// Rate limit management
RATE_LIMIT_PAUSE: 60000,
ADAPTIVE_DELAYS: true,
}
// Model settings
const MODEL_CONFIG = {
temperature: 0.1,
max_tokens: 4000
}
This approach reduced token consumption by 4x and avoided rate limit filters.
Processing Time
LLMs worked autonomously, even overnight:
| Metric | Value |
|---|---|
| Claude processing time | 17 hours |
| DeepSeek processing time | 11 hours |
| Data discrepancy rate | 3% |
| Manual verification | 4 hours |
Model Comparison
Here's how the two models compared across different criteria:
| Feature | Claude Sonnet 4.5 | DeepSeek V3.2 | GPT-4o |
|---|---|---|---|
| Batch processing | |||
| Rate limit handling | |||
| Accuracy | |||
| Cost efficiency | |||
| Speed | Medium | Fast | Medium |
| Context window | 200K | 128K | 128K |
Final Thoughts
Contrary to my expectations, both models managed to process the entire list. The prompt really pulled its weight here. The hybrid approach with two LLMs working from opposite ends proved to be a viable anti-crisis measure when traditional methods fail.
Sometimes the unconventional solution is exactly what you need when time is against you.