Claude's disastrous mistakes

No, but its tone changes when I criticise it too much.

The other day it came out with “You are correct, you have years of experience”

I attended a webinar of a British Aero Company (a few months ago), where they were proudly touting their use of AI. I asked in chat if the developers actually had any experience is using AI for coding (let alone using it on the flying plane). Because when it went wrong, even God couldn’t fix its output.

I got told, that “I didn’t know how to guide the AI”. Try telling that to the passengers when the plane has fallen down, because there is no oversight !

Admittedly they are using the Unreleased versions of non-generative AI.

I think it depends on what stage you are in the project. I am writing a brand new suite (from scratch) with tools I haven’t used before, interfacing to apis, I have never done before. It beats reading all the manuals.

For Stripe, I gave it the url and told it what nested classes I wanted, and the functions. It did it all, and created a test suite to use a test account (on its own accord). I am sure it pilfered someone else’s code. And I do refactor it, so that I can understand it. There was a bit of to and fro as it used some functions that didn’t exist in Delphi.

Bad code is already out there. In one of the webinars of Uncle Bob, he said that he estimates that the number of programmers doubles approximately every five years, which has a significant consequence: it means that half of all programmers have less than 5 years of experience.

The real danger is when they start using AI in the actual operation. At least one British Aircraft firm is doing just that for its next lot of planes.

I interface to external APIs (not many, but a few). No-one else’s code will be at the level of correctness/accuracy that I desire. Once I use that API, I use it for the next decade+. I would rather spend the time to get it right once, and the payoff is over the next decade. I also do things that almost no-one else does, like build the code so it automatically detects when the data returned is extended. This stuff is almost never correct in any documentation, and many APIs don’t have an Open API/Swagger definition. Working in the same area for the last 15 years I know that most of the code out there that does anything like what I need it for (high-level, not libraries) is poor to average at best. This is an advantage for me, and using AI assistants just extends my advantage in this space

It’s a mistake to assume the AI assistants are using “good” code to train with. Syntax they get. Design they don’t

2 Likes

I need to take back my words, from my last post…

Since then, I have used Claude Code with Opus 4.6 (released just after my last post I’m replying to on 5 Feb 2026). I think it is a gamechanger.

Claude Opus 4.6 \ Anthropic

They’ve adequately solved/improved many of the issues I had with other previous LLMs.

Now, it can actually help solve some real complex problems (with some assistance and external insight). It’s a very useful co-developer. It’s like pair programming when you just tell your pair what you want and they code it up.

If you’ve tried LLMs before and felt like I did, then now might be a good time to have another look.

To give you an idea of my recent experience:

I started with a blank code repo. Asked it to build me a design time component that did some complicated low-level work with the FMX framework, things that I’d struggle to get my head around and navigate the state machines involved in processing user interface input.

It failed at first, but I had given it too complex a task. Initially, I was blown away when it built an entire project, and it almost compiled first time (it missed a FMX.Graphics uses clause needed for some enumeration type). It tried to solve it all at once. I told it to scrap it and focus on a standalone component to solve just 1 aspect that it was failing on. Miraculously with some help it did. Along with the component I had asked it to build a demo suite of applications for the component. It did all that and they became a test suite. Iteratively we (the tool and myself) uncovered further edge cases and worked through them.

Finally, I asked it to integrate the new design time component into the program I had been building, which it did. Initially it did a spaghetti way, but I reviewed what it had done and could see a clean approach, which I told it and of course it agreed with me, and just went and did it, and it worked well. I then proceeded to continue using the tool, by this time I had upgraded to a MAX 5X plan, I even maxed that out, a few times, and had to wait the remainder of the 5 hours for quota to reset.

I then spent the next couple of days just prompting it to write code in my application, and testing and reviewing it providing feedback and insight and alternative approaches, that enabled it to write all the code. I cleared what would have taken me a few weeks work I think and solved another UI state machine bug that had driven me crazy some time back. The thing is it couldn’t have done it without me, because it needed me to lift it out of its various trains of thought and get it in the right direction, but maybe one day it would be able to.

2 Likes

A story about porting 10,000+ lines of C++ using Claude in 48hrs.

2 Likes

I think that this has been and still is the key to getting good results, though models are getting better at filling in some of the missing pieces. If you enter a simple prompt expecting the model to know as much about your app, your vision, the libraries that you expect it to use, the techniques that you use without thinking, and keep it on track with good design principles, then the output will be further from where you want it to be. It may seem frustrating at times but you need to persevere with longer than wanted prompts, filling in assumed knowledge gaps.

In my opinion creating an AGENTS.md file (or similar) is a REQUIRED step when using AI agents for coding. Here is an easy way to get started:

  1. [Chat/Ask Mode] Enter the following prompt:
    Create a comprehensive prompt to create an AGENTS.md file and high-level documentation for my Delphi application so that AI models will use these instructions and documentation to always make high quality code changes that align with best practices and existing conventions. The prompt should include the following processes:
    1. Analyse all of the code in the current workspace to determine the repository and unit structure, libraries, conventions, patterns, standards and formats used.
    2. Detailed consideration of coding best practices, such as error handling, memory and resource management, threading and synchronization, re-use or refactor existing methods or classes instead of duplicating code, SOLID principles, favour shorter focused units, use design patterns, create and maintain automated unit tests, and so on.
    3. Focus on Delphi specific conventions and best practices to use, and pitfalls to avoid.
    4. Use the detailed analysis and best practices to generate a structured, readable AI model AGENTS.md file, and one or more human focused documentation markdown files to augment the AGENTS.md file.
  2. [Code Mode] Paste the resulting prompt from step 1 back in to the model to perform the deep analysis and create the AGENTS.md and doc files

On one of our reasonable sized applications step 1 took 3 minutes and 68k tokens with Claude Opus 4.6 and produced a 275 line prompt. Step 2 took 13 minutes and 155k tokens and resulted in a 305 line AGENTS.md file plus three other docs with a total of 1420 lines. The results were quite good but needed a little manual tweaking.

In any case, having a base set of instructions and documentation like this will help improve the results of future prompts. When in doubt, explicitly tag one or more of these files in the prompt.

2 Likes

Impressive. I attempted ports of larger sized libraries (50K lines), but the other way (C to delphi) - ended in failure. I plan to revisit those with opus 4.6, the increased context window might just get me over the line. That and I haved learned a lot about getting the best out of ai since then.

Slighly OT but another story came out from OpenAI last week about how OpenAI built an app with around 1 million lines of AI written code with no manually written code. There were humans in the loop but they created systems and tools and adapted workflow to get the AI (GPT Codex) to do all of the coding:

The initial scaffold—repository structure, CI configuration, formatting rules, package manager setup, and application framework—was generated by Codex CLI using GPT‑5, guided by a small set of existing templates. Even the initial AGENTS.md file that directs agents how to work in the repository was itself written by Codex.
There was no pre-existing human-written code to anchor the system. From the beginning, the repository was shaped by the agent.
Five months later, the repository contains on the order of a million lines of code across application logic, infrastructure, tooling, documentation, and internal developer utilities. Over that period, roughly 1,500 pull requests have been opened and merged with a small team of just three engineers driving Codex.

https://openai.com/index/harness-engineering/

It’s an interesting read and is hopefully where the models and multi-agent / sub-agents continue to move so that we don’t have to create systems ourselves to put it all together.

1 Like

Of course they have the significant advantage of not having to keep within a budget of tokens and computing power - no matter how they spin it. It’s impressive they can do it, but for the rest of us who are not blessed with unlimited token space and never-blocking accounts we’re not going to achieve that kind of enormous success.. yet.

There is definitely a massive risk for Silicon Valley et al - the costs involved in providing the service to us is astronomical and that means somewhere like China - with a sophisticated infrastructure, infinitesimal cost of living, and (if the stories are to be believed) a less finely tuned regard for work/life balance - is going to be a risk, as we’ve already seen hints of.

Restricting NVIDIA AI compute boards is only going to thwart price competition for a short period of time and it’s not likely to be long enough to get ahead in the game.

In the end it’s going to be down to providing capable services at a price point which makes larger and larger profits (to recoup that massive capital investment) while staying just on the “ouch” side of consumer affordability. The next year or two is going to be a bloodbath, with some substantial shocks along the way I think.

All built on human power, BTW – could never be done without us doing all the initial work. AI does good “copying”, but it’s not original

Additionally, number of lines of code is always a terrible metric. A great day for me is having less code at the end of the day than at the start

1 Like

Nothing is original; it’s all copy/paste. AI is just reproducing what we can do at scale. AI is useless, of course. Except when compared with wetware intelligence. Carumba I’d rather code with Claude than with all but a few of the programmers I’ve ever hired or collaborated with. And the ones I would work with are all busy leveraging Claude anyway

1 Like

It works out to around 7k LOC output per day across the 5 months assuming perfectly linear progress, which is around 70k output tokens per day. Input tokens would likely be much greater (general estimates across all coding tasks would be max. 3x input on average = 210k tokens/day).

Napkin math purely for the perfect code generation and not accounting for refactors/rewrites, reasoning, or other non-code-generation tasks such as processing AGENTS.md, memories, agentic IDE system instructions, tool calls, reading documentation, code review, evaluating unit test results etc, puts it at around US$1.50/day for GPT-Codex. Even if you x100 that is still pretty cheap! I’d love to see their token counts for this project.

Grahame Grieve – totally agree :wink:

1 Like