As Karpathy puts, there’s a new kind of coding, “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. Sometimes the LLMs can’t fix a bug so You just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. You’re building a project or webapp, but it’s not really coding - You just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
I’ve been a productivity fanatic for a few years now. While I may not fully agree with Karpathy, I’ve become equally obsessed with optimizing how I build & have been exploring agents recently. That means constantly evaluating the AI-powered tools promising to boost my productivity. This isn’t a list of features; it’s a breakdown of my actual usage, backed by the metrics that matter to someone who’s built a few interesting stuff including a Realtime-Elasticsearch DSL Compiler, Java-based recommendation engines, etc.
The Core Tools: How I Stack Them Up
Instead of vague comparisons, here’s the cold, hard data I’ve collated:
Tool | Code Gen Speed (LoC/min) | Test Accuracy (%) | RAM Usage (IDLE/ACTIVE) | Java Framework Awareness | Context Retention (Tokens) |
---|---|---|---|---|---|
Qodo Gen | 42 | 89 | 1.2GB/2.8GB | 84% | 16k |
Cursor | 112 | 72 | 2.1GB/4.5GB | 81% | 128k |
GitHub Copilot | 98 | 81 | 0.9GB/1.8GB | - | - |
Claude | 36 | 84 | 0.4GB/1.2GB | 70% | 200k |
Personal Evaluation
Feature (Max Score) | Cursor | Qodo | Claude |
---|---|---|---|
Spring Boot 3.2.2 Code Completion (30) | 23 | 25 | 22 |
Spring Boot Awareness (25) | 21 | 20 | 21 |
Debugging & Error Resolution (20) | 15 | 16 | 13 |
Refactoring & Code Optimization (15) | 11 | 9 | 9 |
IntelliJ Integration (10) | 5 | 7 | 3 |
Test Generation (10) | 6 | 7 | 5 |
Total Score (100) | 81 | 84 | 73 |
Data sourced from GitClear’s 2025 AI Code Quality Report & my personal benchmarks/estimates. Don’t quote me on numbers!
These numbers dictate my workflow.
Speed matters for initial drafts, accuracy for production readiness, and RAM usage? Crucial on my M3 Pro.
Use Case Deconstruction: Matching Tools to Projects
Here’s how I leverage each tool, informed by specific project types:
1. Building Algorithmic Systems
Example: The Credit Card Recommendation Engine that I’ve created this year
-
Qodo:
- Strength: JUnit test generation. I can trust it to catch most of edge cases in eligibility calculations.
- Downside: Medium-large codebases trip it up. Slow response times.
- USP: Solid Intellij integration.
-
Cursor:
- Strength: Pattern suggestions for defining complex code flow. This significantly cuts down on boilerplate + maintains readability. Quite Affordable.
- Downside: Multi-module Maven projects? It’s basically blind.
-
Claude:
- Strength: Very high code quality.
- Claude 3.7 Sonnet feels quite superior in Claude Code than in Cursor for eg.
- General technical chat is the best with Claude for eg- explaining algorithm complexities, discussing theorems, etc.
- Downside: It’s not an IDE.
- It’s a node based terminal application that works on the codebase. That makes it a bit harder to work with.
- As of March 3, 2025, Claude Code is quite expensive than the regular Claude Chat. It is currently in beta research preview and is billed based on API usage via the Anthropic Console account.
- Strength: Very high code quality.
2. Projects that need rapid prototyping/ fast mvp (Node.js/React)
Example: The Frontend-to-Elasticsearch DSL compiler that I wrote at Bik called the Segment Builder
-
Qodo Gen:
- Qodo wasn’t around back then.
-
Cursor:
- Advantage: Auto-completing complex TypeScript type definitions for that DSL. A huge time saver.
- Downside: Should be avoided for anything beyond basic business logic, such as performance optimizations and architecture-level changes.
- RAM Warning: Holds around 5.2GB RAM during intensive TS type inference.
-
GitHub Copilot:
- Secret Weapon: Translates natural language directly to Elasticsearch bool queries. This helped me prototype fast.
- Caveat: Watch out for those aggregation bucket sizing errors; Copilot gets them wrong almost 2 in 5 times.
Diving Deeper: Platform-Specific Goodies
These features influence whether I choose a tool.
Qodo Merge PR Features
- Great general code hygiene: Decent code hygiene suggestions via Qodo Merge.
- Java-Specifics: Catches basic stuff like
Optional
misuse cases, anti-pattern usage of lombok annotations.
Cursor’s Codebase-Aware AI
- Cross-File Reasoning: It can actually resolve type mismatches across multiple React + Node files.
- M3 Pro Notes:
- With my 18GB M3 Pro, I can handle projects up to ~15k LOC comfortably.
Claude’s General Features
- Great VS Code integration: Pre-configured extensions and optimized settings
- Session persistence: Preserves command history and configurations between container restarts
- Terminal Integration:
claude --review
is quick.
Navigating the Tradeoffs: My Decision-Making Framework
There’s always a compromise:
Tool | Primary Benefit | Primary Cost |
---|---|---|
Cursor | Prototyping Speed | Code Churn + RAM |
Qodo | Test Coverage | Iteration Speed |
Claude | Code Quality | Prompting Overhead |
I constantly recalibrate based on these factors.
The AI-Driven Code Quality Cliff: Addressing the GitClear Report Findings
The GitClear report paints a concerning picture: AI-generated code might be optimizing for short-term gains at the expense of long-term maintainability. Here’s how I’m adapting my workflow to mitigate these risks, based on their findings:
1. The Rise of Duplication: My Refactoring Focus
- GitClear’s Observation: “Copy/Pasted” code now exceeds “Moved” code in frequency. Refactoring (moving code) is down significantly.
- My Response: I’m consciously prioritizing refactoring, even when it feels faster to copy/paste a solution. Claude, despite its performance hit, becomes crucial for understanding and modularizing existing code. I’m setting aside some of my dev time specifically for refactoring, regardless of sprint goals.
2. Increased Code Churn: More Maintenance Burden
- GitClear’s Observation: “Churn” (frequent revisions) is up, indicating increased defect rates in AI-assisted code.
- My Response: More rigorous testing before merging. AI-generated tests can sometimes increase churn if they miss edge cases or generate redundant checks. To counter this, I balance AI-generated tests with manually written integration tests that mimic real-world usage scenarios. This means going beyond Qodo’s generated tests and writing custom integration tests that cover real-world scenarios. I’m also experimenting with contract testing to catch API regressions early.
3. The “More Lines of Code” Trap
- GitClear’s Warning: Measuring developer productivity by “lines added” incentivizes AI-driven maintainability decay.
- My Response: I’m pushing for outcome-based metrics (features shipped, bugs resolved) and explicitly discouraging line-count comparisons.
4. Shorter Code Lifespan
- GitClear’s Observation: Increasing defect rate correlating AI adoption. Google DORA showed extrapolated change in delivery stability for 25% increase in AI.
- My Response: I’m focusing on increasing test coverage and improving refactoring rate.
Tool Selection: My Recommendations Based on Real-World Usage
- If You Need Rock-Solid Tests Immediately: Choose Qodo. Especially for domains with serious compliance requirements and TDD setups.
- If You’re Churning Out MVPs: Cursor is the only way to go.
- If You Need an Architecture Review Buddy: Claude is invaluable, especially with the Claude 3.7 Sonnet.
Optimizing for My M3 Pro (18GB)
This hardware defines my limits. I can safely run:
- Three Java projects in Qodo + two Node services in the background.
- One mid-sized (under 8k LOC) monorepo in Cursor.
- Claude background agent + three terminal sessions.
I also tweak Cursor’s resource limits to get the best performance.
The Bottom Line: My Workflow
For the examples we discussed, here’s how I’d approach-
-
Realtime Elasticsearch DSL Compiler:
- Cursor for the rapid TypeScript development.
- Claude for query DSL validation.
-
Credit Card Recommendation Engine:
- Qodo for generating comprehensive tests.
- Qodo/Claude for custom comparators, rule evaluation, feature match, etc.
-
Portfolio Site:
- Cursor + Cloudflare Pages/Workers.
- Fun Fact: I spent about 2 days in 2019 creating a Portfolio website & configured ec2/elasticIp/etc. I created the latest portfolio website in about 45 minutes with Claude & Cursor & setup auto deploy on cloudfare pages. MVP gains are unreal!
Ultimately, I’ve found that strategically combining tools yields much better results than relying on a single “magic bullet.” It’s about understanding each tool’s strengths and weaknesses and composing them into a system that amplifies my own capabilities.