A model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software

The title here, a paraphrased quote from [Gary Marcus], on TNW, today, evaluating a claim from “OpenAI president [who] says AI is now writing 80% of the company’s code”.

Marcus' specific point about coding is structurally important: a model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software. The first is verifiable in seconds; the second requires the kind of judgement that has been the historical bottleneck on engineering productivity. Brockman acknowledges the gap, even as he argues it is closing. "The technology we have right now is very jagged," he said in the Big Technology interview. "It is absolutely superhuman at many tasks. When it comes to writing code, those kinds of things, the AI can just do it. But there's some very basic tasks that a human can do that our AI still struggles with."

Realism re AI coding is knowing that next-word prediction gets us a surprisingly long way in writing code, but less far in making sure that code is robust. Coders (especially vibe coders with little experience) beware

As good as these tools are getting — and they are getting really good and helpful — I don't see a non-technical person, say a product manager or marketing person, being able to steer and coerce these LLMs into producing software that any company should be willing to expose to the internet and their user base if they care about robustness, security, reliability and maintainability of the system. Especially if revenue and reputation are on the line.

Loosely Coupled Thoughts

A model that produces code which compiles and passes the tests it was given is not the same as a model that produces correct, secure, maintainable, well-architected software

Recent Posts