software-factory

Human Review Is the Bottleneck

Chris Parsons recently wrote up an article about feedback being the bottleneck.

"Feedback" here, for me, refers to the human effort of gatekeeping LLM outputs before they go out out and serving their purpose (software released to production, an LLM-drafted email being sent, LLM-created slide deck reviewed before presenting, etc.).

This is most relevant where LLMs are generating outputs at greater speeds than long-standing engineering practices keep up with.

The most prominent aspect of this is the pull request as the quality gate. It's not the only one, but it is the one where humans are most in the loop. And the one that most people who care about quality, maintainability, security and operational stability are most uncomfortable with removing or even loosening up on.

The problem, as Chris Parsons nicely puts it:

This is the theory of constraints in action: speed up one stage and the bottleneck moves downstream. When code arrives faster, it pushes more work into review, testing, deployment, and requirements clarification. The queue grows. Nobody is reviewing any faster.

If review, testing and quality assurance are simply removed or loosened up on to allow the bombardment of PRs to flow through, it means - more consequentially - that user, organisation reputation and potentially revenue affecting production quality and incidents will become the bottleneck. Eventually it will come back to bite.

Humans get fatigued, AI doesn't.

With AI, the human is the only one who needs rest while the machine keeps generating work that needs evaluating: permission fatigue, review fatigue, the endless “just need a human to press approve” requests. Cory Doctorow calls this the reverse centaur: humans whose purpose is to support the machine’s needs.4

The reviewers who care most about quality will be the first to burn out, because they are the ones who read everything instead of skimming. Either you make human feedback unnecessary, or you make it instant.

So what does one do about this review bottleneck?

This is the crux of the problem, the very point where we are forced to decide whether we will hold a tight grip on these long-standing practices and push those senior reviewers to their tethers' ends with unrelenting AI reviews (often slop). Or clear the AI on its path of disruption and we adapt accordingly.

On one end of the spectrum, we could double down on existing engineering practices. Humans review everything. This may be most comforting as it may give the most objective confidence to responsible humans that quality is where it needs to be.

It simply doesn't scale. There is too much code being generated, and reading/reviewing only does not give the same in-depth understanding of what was built if the humans were building it too. So things will and do slip through.

Or alternatively we give in and loosen the quality/review gates without additional steps. This risks pushing low quality, architecturally deficient, bug infested code that no one understands into production. The instant gain was real, breakneck delivery speeds. Even the Product Manager has something to push to production. But the theory of constraints just pushes the bottleneck downstream, right into production where it hurts the most and is the hardest to unwind.

[... 1672 words]