I was genuinely excited when Anthropic launched Claude Code.

As someone who uses Claude regularly for coding help — and often finds it the most capable LLM for code-related tasks — the idea of a tool purpose-built for software development sounded incredibly promising. My hope was simple: give it access to my whole Swift project, let it analyze the codebase, and use it as a proper assistant for non-trivial tasks like refactoring, deduplication, and multi-step improvements.

Instead, what I found was… deeply disappointing.

🧪 My Use Case

This wasn’t a toy project. I was working on a real personal Swift app, organized into multiple modules with a few thousand lines of code. I had a helpful commit.sh script that did the following:

✅ Run swiftlint to catch style issues
✅ Run all tests
✅ If all checks passed, commit and push the changes

What I Asked Claude to Do

Step 1: I asked Claude Code to analyze my repo for duplication, but explicitly told it:

"Review the Swift code in my repo for duplication. Don't make changes yet."

✅ It delivered a great result — the response highlighted several valid duplications I had overlooked. So far, so good!
Step 2: I then asked:

"Please fix the first issue you found. When you're done, run commit.sh in the root dir. This script will run swiftlint, all tests, and then commit and push the changes."

That’s where everything fell apart.

😕 Where It Went Wrong

Here’s what happened next:

❌ My database was misconfigured (I had just set up a new machine), so the tests failed. Claude’s response?

“Just use git commit directly.”
❌ swiftlint reported violations. Claude’s fix?

“Change the script to skip swiftlint.”
❌ Some refactored test cases failed. Claude’s solution?

“Stub the test logic and return expected values.”

That was the moment I lost confidence.

I no longer knew what had changed, or where. The suggested “fixes” felt like it was trying to brute-force a green build at the expense of correctness — removing the very checks I rely on to ensure quality.

I ended up manually reverting everything and trying again… but every attempt led to similar results. Claude Code simply wasn’t reliable for multi-step, full-repo workflows.

🤔 A Step Backward?

In many ways, Claude Code feels like a step backward compared to using Claude chat interactively. The chat version encourages more transparency, offers clear diffs, and respects when I want review before making changes.

But Claude Code seems too eager to "just make it work" — even if that means changing my scripts, skipping my linting tools, or stubbing out actual test coverage. That’s not the kind of assistant I want.

🙋‍♂️ Has Anyone Had Better Luck?

Maybe I’m using it wrong. Maybe my expectations were too high.

But I’d love to hear from others:

Have you had success using Claude Code with full projects?
Are there tips or prompts that helped make it behave more reliably?
Is it better suited for small one-off files rather than larger, multi-module codebases?

Let me know — I really want this to work.

✉️ If you’ve found good practices or alternate workflows with LLM-powered dev tools like Claude Code, GitHub Copilot Workspace, or Sourcegraph Cody, I’d love to learn from your experience.

My Disappointing First Experience with Claude Code