May 2026 · 5 min read

Code Review Is No Longer Just About the Code

A pull request used to be the output of one engineer. Increasingly it is the output of a workflow. That changes what review actually is.

This is the fifth post in a series on what AI is actually changing in software engineering and engineering management. Earlier posts: coding got faster, delivery did not · the new management problem is not adoption · repository memory and the risk of teaching AI your mistakes · when code gets cheap, architecture gets scarce.

This post expands on a thread I posted on LinkedIn.

What a pull request used to be

A pull request used to represent something fairly simple: one engineer's work, ready for another engineer's eyes. The review surface was the diff. You looked at what changed, asked whether it was correct and whether it belonged, and either approved or pushed back.

That model is starting to break down — not dramatically, but quietly, as AI-assisted development becomes part of how teams actually work.

More often now, a pull request is the output of a workflow. A person and a coding agent. Repo instructions the agent was following. Context accumulated from previous sessions. Generated tests. Automated checks. Sometimes AI-assisted review layered on top of AI-generated code. The diff looks like it came from one place. The process that produced it is considerably more complex.

That changes what review actually is.

The review surface has expanded

The code still matters. That part has not changed. But a clean diff does not mean a clean process anymore, and the reviewer who treats them as equivalent is missing something.

Consider an agent-generated pull request that updates a service cleanly, follows local naming conventions, and passes all the checks. On the surface it looks solid. What the reviewer may not see is that the agent was operating from a repo instruction that quietly preserved an outdated service boundary the team should have been removing, not reinforcing. The code is correct relative to the instructions it was given. The instructions were wrong.

The reviewer is no longer only asking whether the code is correct. They are also asking — or should be asking — whether the workflow that produced it deserves trust.

Diagram showing the expanded review surface: before AI, review covered the diff; with AI workflows, review also covers the agent instructions, repo context, automated checks, and assumptions baked into the workflow
The review surface used to be the diff. With AI-assisted workflows, it also includes the instructions the agent followed, the context it inherited, and the assumptions baked into the checks that made the change look safe.

Not every pull request needs this level of scrutiny

The right response here is not to apply deep workflow review to every change. That would make the process slower without making it meaningfully safer — most of the volume increase from AI-assisted development is in exactly the kinds of changes where the workflow is simple and the risk is low.

The right response is knowing which changes do warrant it.

The signal is not just "AI was involved." The signal is the nature of the change combined with how it was produced. A pull request that came out of a longer agent loop, that inherited substantial repo context, that was shaped by instructions the reviewer did not write and may not have seen — that is where the real review surface is larger than the diff, and where treating it as a routine approval is the kind of mistake that does not announce itself.

In those cases, the questions worth asking go beyond the code itself: What boundaries was the agent allowed to cross? What instructions was it following, and are those instructions still correct? What assumptions are baked into the automated checks that made this look safe? Would this change look the same if it had been written by a person with full context rather than generated from accumulated instructions?

What this requires from reviewers

The skills that made someone a good reviewer before still matter. Understanding the codebase, catching design problems, asking whether the change belongs — all of that is still the job.

What is new is the need to sometimes step back from the diff entirely and ask a different question: does the system that produced this deserve the level of trust I'm about to extend to it?

That is a harder question to ask under volume pressure. When PR counts go up, the default is to compress review into "does this work?" Getting from there to "was this produced well?" requires a deliberate decision that the change warrants it — and a culture that supports slowing down on the right things rather than treating all review as a throughput problem.

The stronger review culture will not be the one that approves AI output faster. It will be the one that knows when to review the code, and when to review the workflow behind the code.

Once code-producing systems become part of how a team works, reviewing the output alone is no longer enough. The output can look correct while the process that generated it was quietly wrong. Catching that requires reviewers who know what they are actually looking at — and organizations that treat that distinction as a real engineering skill, not a theoretical concern.

← Previous post