The Infrastructure Around Your AGENTS.md

In the previous post, we talked about loading specific domain knowledge when it is most relevant thanks to Skills. But there is something I intentionally left out: how do you actually make sure the agent is following what you have written in those Skills?

One of the answers I found working for my projects and for personal and client work is not adding more instructions but setting up an infrastructure.

One thing I realized is that the AGENTS.md is about 10% of the work. The ecosystem around it is the rest. A lean AGENTS.md works if you have guardrails that enforce rules and make sure the agent is following them when writing code.

Deterministic tools beat prose rules

This is a simple rule that has been working for both my client projects and my personal projects: deterministic tools beat prose rules. If something can be linted, do not write it in your AGENTS.md. Instead of a convention section that says “use consistent formatting,” “keep imports sorted,” “use camelCase,” and so on, configure your linter — Biome for TypeScript, Ruff for Python. Then tell your agent to run it when it is done with a piece of code or with work.

You can even skip telling your agent to run the linter and simply have a formatter in place that rewrites the code the way you expect it.¹ It also works with type checkers, where they immediately catch APIs that do not exist and have been hallucinated by your agent.

This is what I mean by deterministic: the feedback is immediate, unambiguous, and can be automated. This way, contrary to prose rules that can feel like suggestions, these are blockers for your agent.

Scripts as the Interface

An additional pattern I have seen work pretty well is using scripts as an explicit interface instead or in addition to documentation.

As an example, if you use Drizzle for migrations, you could have written the following in your AGENTS.md:

## Database Migrations

When creating migrations:
1. Update the schema file first
2. Use snake_case for table names
3. Run the generate command with the format...
4. Name migrations using YYYYMMDD_description...

Instead, you could define those scripts in the package.json:

{
  "scripts": {
    "db:generate": "drizzle-kit generate",
    "db:migrate": "drizzle-kit migrate"
  }
}

And in your AGENTS.md:

## Database

- Drizzle ORM + PostgreSQL
- Scripts in package.json. Never write migrations by hand.

I have noticed that agents already read the package.json file naturally, so there is no need to document how your migrations should be written or created. They can “just” call the script.

Tests as the feedback loop

The best thing you can give to your agent is a small and simple feedback loop. As I wrote about this a couple of weeks ago in Smaller and simpler feedback loops, it is worth repeating here. If the agent can run tests and see its results immediately, it can iterate instead of guessing or instead of you having to describe what needs to be changed.

For UI components, using Storybook interaction tests works extremely well. The agent sees the test fail, reads the error, fixes the code, and rebounds. In addition, if you have it set up, it can also check what the component looks like by using Playwright via MCP or any other custom tools.²

Example setups

Here is what a minimal but effective infrastructure looks like for the stacks I have worked with the most, both for client projects and personal projects:

JavaScript/TypeScript:

Biome for formatting + linting (one tool, one config)
TypeScript strict mode (noUncheckedIndexedAccess, strictNullChecks)
Vitest for unit tests
Storybook interaction tests for components

Python:

uv for package/env management
ruff for linting + formatting
ty for type checking
pytest for unit tests

For other languages, the idea is the same: fast tools, strict settings, immediate and concise feedback.

What goes where

When I need to figure out where to put things, here is the template I follow.

What	Where
Project context, north star	AGENTS.md
Non-obvious conventions	AGENTS.md
Style, import order, naming	Lint rules
API shapes, data structures	Type definitions
Behavior verification	Tests

To repeat myself, but if it is something that can be enforced, it should be enforced either via rules or via scripts. If it is more context for the agent, it should be in the AGENTS.md.

The setup in action

When the infrastructure is set up correctly, here is what a change made by an agent can look like:

Agent makes a change and commits it
Pre-commit hooks run the formatter → code is styled correctly
Linter runs → catches import issues, naming violations
Type checker runs → catches hallucinated APIs, missing null checks
Tests run → verify behavior works
Human reviews → focuses on intent, not nitpicks

At each of those steps, the agent has the possibility of fixing its mistakes or the issues. It does the fix, retries it by running the test, and then it moves on. Finally, by the time you actually are reviewing, you can focus on what is important:³ the UX of the feature, the intent of the update, whether it matches the direction of the product you are working on, and so on.

The only caveat to automating formatting is that agents have a hard time with files that change right after they write them. Even though models are increasingly trained to re-read files before additional updates, they sometimes revert formatting changes, which is why linting is preferred here, at least for me.
Tools like: Playwright MCP, Playwriter Chrome extension, dev-browser tool, browser tools skill.
At least in my opinion.