AI agents are the easy part. The standards are the work.

The agents are the quick part. Genuinely. When I build an agent system for a company, the model choice and the wiring are days of work. Pointing a capable model at a library of documents and giving it a job is no longer the hard bit. The hard bit is the library. The slow, unglamorous work happens before any model is involved: writing the company's own standards down in plain documents, one rule at a time.

That ordering is the whole point, and it is the part most teams skip.

The blank prompt box problem

Most teams adopt AI tool-first. Buy the licence, open the blank prompt box, hope.

What comes out is fluent, generic and slightly wrong everywhere. The voice drifts. Claims appear with no evidence behind them. And because everyone prompts from their own head, every person on the team gets a different quality of output. Monday's email sounds nothing like Thursday's landing page, and neither sounds like you.

The tool is not the problem. The missing layer underneath it is. An AI with no standards to follow produces content with no standards. Just faster, and in bulk. Gartner expects over 40% of agentic AI projects to be cancelled by the end of 2027, and my bet is that most of those cancellations will trace back to this missing layer, not to the technology.

What a standards pack contains

So the build starts with writing, and the writing has a shape. Four documents, none of them exotic.

Voice rules, written as do and don't pairs rather than adjectives. "Professional but warm" instructs nobody, human or machine. A don't example of the padded sentence next to a do example of the tight one instructs both. Add the banned vocabulary list, the words that mark a piece as generated rather than written.

Claims discipline. Every claim carries evidence: a number, a named system, a shipped thing. If the receipt does not exist, the claim does not ship. This is the document that stops an agent inventing things, and it is the one I write first in any build, because it is the hardest to retrofit.

A quality bar. What good looks like for each content type, what gets killed, with examples of both. Without it, everything competent ships, and competent is exactly the texture of slop.

The workflow. Where the agent runs, where a human reviews, where output ships. An agent lives inside a workflow or it is a toy.

None of this is new knowledge. It is the judgement your team already applies, stored in the worst possible place, which is one senior person's head. Quality lives there as taste. Taste does not scale, does not transfer, and does not survive that person being busy. Writing it down makes it explicit. Explicit is trainable.

The agent is the printer

Then, and only then, the agents.

Trained on the pack, they draft inside the rules rather than from a blank page. They also review work against the same rules, flagging where a claim has no receipt or the voice has slipped. Same standards on the way in and on the way out, and a named human signing off anything that publishes.

A printer is the right mental model. A printer is fast, consistent and completely indifferent to quality. It reproduces whatever you feed it. Feed it your standards and it prints your standards. Feed it nothing and it prints fluent nothing, at scale. The internet is currently rather full of fluent nothing, which tells you what most teams are feeding their printers.

Fix the standard, not the output

Here is where the system starts to compound, and where I ask every team to change one habit.

When an output is wrong, the obvious move is to edit the output. Do that, but treat it as the smaller half of the job. The real fix is to find the standard that allowed the mistake and tighten it. Then every future output inherits the correction in one move. A decision the team makes once becomes a decision the agents apply from then on.

This only works because the standards are documents. Writing can be versioned, checked and improved. Taste cannot. Quality stops depending on whether the model is having a good day and starts depending on rules that exist in writing. That is the difference between renting AI and operating it.

The documentation is the moat, and it is yours

The honest framing I put in front of any leadership team weighing an AI budget: the agents are not the differentiator. Your competitors can rent the same model tomorrow, at the same price. What they cannot rent is your judgement, and the standards pack is your judgement made explicit. The asset the build leaves behind is a library that belongs to you, not to me and not to any vendor.

There is a practical consequence hiding in that. Models change. Vendors change. Pricing changes. The documents survive every swap, because they encode what you want, not how any particular tool behaves. If your AI capability lives in one tool's settings and a folder of clever prompts, you rebuild it every time the ground shifts. If it lives in your standards, you point the next model at the same library and carry on.

The moat is not in the tool layer. It never was.

Governance is part of the build, not an afterthought

An agent system without governance is a liability with good output. So the pack ends with rules about people.

Who owns each standard, and who may change it, because a standard nobody owns decays into a suggestion. Where human review sits, and what the reviewer is actually checking: claims against receipts, voice against rules, not vibes. What ships without sign-off and what never does. And who switches the system off when something looks wrong, because someone must be able to.

This is also where the EU AI Act stops being frightening. The transparency obligations assume you can answer one question: where does AI touch your content? A documented pipeline answers it as a by-product. Provenance marking becomes a field in a template and a step in a workflow, and the audit trail falls out of the documentation you already wrote. The teams in trouble are not the heavy AI users. They are the ones who never wrote anything down.

Where to start, practically

Not with a platform evaluation. With writing.

Write your voice rules as don't and do pairs, using real examples from your own content, the good and the embarrassing. Write the banned vocabulary list. Write the claims rule and mean it: no receipt, no ship. Then define what good looks like for the one content type you produce most.

That is a small stack of documents. It needs no budget approval and no vendor call. Whether you then build the agent layer yourself or bring in someone like me, the pack is the build. The difference between an agent trained on your standards and the blank prompt box is the documentation. It always was.

If your standards are not written down, you do not have an AI problem to solve yet. You have a writing job. Do that one first. The agents will still be there when you finish, cheaper and better than they are today, waiting to print whatever you hand them.