The AI Agent Problem Is Not the Model. It Is the Context.

Most conversations about AI-assisted development still start with the model. Which one writes better code? Which one has the bigger context window? Which one is cheaper? Which IDE has the best agent this month?

Those questions matter, but I do not think they are the durable problem.

Models will keep getting better. Tools will keep changing. Today it is Claude, Cursor, Codex, Copilot, and a handful of other agents. Tomorrow the names may change again. The thing that will not disappear is the need for a system that gives these tools the right context before they act.

The real bottleneck is not intelligence

A strong model can write code, explain a stack trace, summarize a pull request, and propose a migration. But in a real software team, that is not enough.

The model needs to know which GitHub PR changed the old behavior, which Linear or Jira ticket defines the current acceptance criteria, which Slack thread contains the actual decision, which Sentry issue represents production pain, which runbook is still valid, and which README is now a lie.

That is not just more text. That is context quality.

The hard question is not can the model understand this if we paste it in? The hard question is which context should the system put in front of the model, with what confidence, from which source, under which permissions, for this exact task?

Context window is not a knowledge base

A larger context window helps, but it does not solve truth, freshness, provenance, or permissions.

If you put stale docs, old ticket comments, half-remembered architectural decisions, and private customer data into a giant context window, you did not build an agent memory system. You built a bigger confusion machine.

Good context systems need to answer boring questions very well:

Where did this fact come from?
Is it still current?
Who is allowed to see it?
Which newer decision supersedes it?
Should the agent use it automatically or ask for review?
How do we know it improved the outcome?

This is why I am more interested in context infrastructure than generic AI memory. Memory is one part of it. The system around memory matters more.

Read-first before autonomy

I think the safest first step for most teams is not to let agents write everywhere. It is to make them better readers.

Give an agent read access to the right tools first: GitHub, Linear, Jira, Slack, Sentry, docs, CI, incidents, and deployment notes. Then make that access scoped, logged, and source-backed. Only after that does it make sense to expand toward write actions.

An agent that can read the right Slack thread and Sentry issue before touching code is already more useful. An agent that can also explain why it believes something, and point to the source, is much easier to trust.

What I think a good system needs

At a minimum, I want an AI-assisted software workflow to have these pieces:

Tool map: where engineering context lives across repos, tickets, chat, docs, monitoring, and deploy systems.
Provenance: every durable memory item points back to a PR, commit, ticket, doc, chat, incident, or human decision.
Freshness: the system can mark context as stale, superseded, deprecated, or needs review.
Permissions: private context does not leak into the wrong project, team, customer, or user scope.
Memory diffs: after a PR, incident, ticket update, or Slack decision, the system proposes what it learned and asks a human to approve, reject, or correct it.
Evals: the same task should be tested with and without the context layer so we can measure whether the agent actually performs better.

This is not glamorous. It is not a viral demo. But it is the kind of work that makes agents useful in production engineering teams.

A simple example

Imagine an agent is asked to fix a payment webhook bug.

A weak setup gives it the repo and maybe a README.

A better setup gives it the repo, the relevant files, the Linear ticket, the last related PR, the Sentry issue, the incident note from last month, the current deployment rule, and a warning that an old retry script is deprecated.

The best setup also tells it where every piece of context came from, what may be stale, and which actions require human approval.

The code output may look similar at first glance. The difference is whether the agent is making lucky guesses or operating inside a trustworthy context system.

Why I care about this

I have spent years building backend systems where reliability matters: commerce, payments, ad tech, Web3 infrastructure, and production systems with real users. The pattern is always the same. The code is only one part of the system. The surrounding operational context decides whether the team can move safely.

AI makes that more obvious. If we give agents poor context, they will confidently amplify poor assumptions. If we give them source-backed, permission-aware, fresh context, they become much more useful.

That is the area I want to go deeper into: reliable context systems for AI-assisted software teams. Not because one tool is perfect, but because every tool will need this layer.

The bet

My bet is simple.

The winning teams will not be the ones that blindly adopt the most aggressive AI agent. They will be the ones that build the best context infrastructure around whichever agents they use.

Models will change. IDEs will change. Pricing will change. The durable advantage is knowing how to make context work: what to retrieve, what to remember, what to forget, what to verify, what to hide, and how to measure whether it helped.

That is the work I am interested in now.