Fitness Functions: Unit Tests for Your Architecture
Most teams have a testing strategy. Unit tests for behavior. Integration tests for contracts. E2E tests for user flows. Linters for code style.
But ask: what tests your architecture?
When a developer imports from another module’s internals — what catches it? When a dependency flows in the wrong direction — what fails? When a new service bypasses the repository layer and queries the database directly — what blocks the PR?
Usually, nothing. The architecture is the one thing everyone agrees matters and nobody automates.
The Missing Layer
The testing pyramid has a gap. Linters catch style. Unit tests catch logic. Integration tests catch contracts. Nothing catches structural decay — the slow erosion of module boundaries, layer violations, coupling that grows session by session.
Fitness functions fill that gap. A fitness function is any automated check that protects an architectural property. Not what the code does — how the code is organized.
The boundary between a unit test and a fitness function is the question they answer:
- Unit test: “Does this function return the correct value?”
- Fitness function: “Does this module still respect its boundaries?”
Both are automated. Both fail when things break. Both prevent regression. But they protect different things — behavior vs. structure.
Why This Matters in the AI Era
Before AI, structural degradation was slow. One shortcut per sprint. One boundary violation per feature. Teams noticed it in quarterly reviews, if they noticed at all.
AI changed the speed. An agent can generate 20 files in a session. If those files follow the wrong pattern or cross the wrong boundary, the degradation that used to take months happens in an afternoon.
Faster code generation without structural enforcement is faster decay.
This is not the AI’s fault. The agent follows whatever patterns exist in the codebase. A well-structured codebase with clear boundaries produces reliable output. A messy codebase with implicit rules produces inconsistent output regardless of the model.
But the relationship goes deeper than speed. AI agents need structure to work well — and fitness functions are what keep that structure intact.
AI amplifies whatever patterns exist. When boundaries are clean, the agent respects them. When boundaries have exceptions, the agent copies the exceptions. One violation in a module becomes the pattern the agent follows for every new file in that module. Without fitness functions, a single shortcut propagates at machine speed.
AI can’t review its own architecture. An agent generates code that passes linters and tests. It looks correct. But whether the code respects module boundaries, whether it maintains layer direction, whether it keeps contracts consistent — these are structural properties the agent doesn’t check unless you give it a fitness function that checks.
AI makes the cost of not enforcing structure exponential. With one engineer writing code manually, an unenforced rule produces maybe one violation per week. With multiple AI agents generating code in parallel, the same unenforced rule produces violations in every session. The longer you wait to add the fitness function, the more cleanup you need.
Structure was always important. AI made it critical infrastructure.
Architecture Is Not One Thing
Architecture spans multiple dimensions. Each dimension can degrade independently. Each needs its own fitness functions.
| Dimension | What degrades without it | What the fitness function checks |
|---|---|---|
| Structure | Module boundaries blur, coupling grows silently | Import direction enforcement, no circular dependencies, coupling thresholds |
| Contracts | API changes break clients, schema drifts from reality | Breaking change detection on PR, schema-to-code drift alerts |
| Data | Migrations destroy data, schemas diverge from entities | Destructive operation blocking, entity-schema consistency validation |
| Maintainability | Files grow, complexity hides, dead code accumulates | Cyclomatic complexity limits, file size caps, dead export detection |
| Performance | Bundles bloat, queries slow, response times creep | Bundle size budgets per route, query cost limits, response time thresholds |
| Security | Secrets leak, dependencies rot, endpoints go unprotected | Secret scanning in commits, CVE detection, auth enforcement on all endpoints |
| Reliability | Errors cascade, timeouts are missing, retries are absent | Error boundary requirements, timeout enforcement, retry policy validation |
| Observability | Incidents take hours to diagnose, logs are unstructured | Structured log format enforcement, required trace spans on critical paths |
Not every project needs every dimension. A payment system needs strong security, data, and reliability. A content platform may prioritize performance and contracts. A monorepo with many teams needs structure and contracts above all.
The question to ask for each dimension: “If this degrades silently, does the system fail in a way that matters?” If the answer is yes, that dimension needs a fitness function. Most codebases only have automated checks for maintainability (linters) and maybe contracts (type checking). Everything else is trust and manual review — which works until it doesn’t.
What This Looks Like in Practice
The dimensions are abstract until you see them applied to a real system. Here’s what fitness functions look like for different project types:
API:
| Dimension | Fitness function |
|---|---|
| Contracts | Schema breaking change detection on every PR |
| Structure | Resolver → Service → Repository — no skipping layers |
| Data | Migration naming conventions, entity-schema drift detection |
| Security | Auth decorator required on all resolvers |
| Maintainability | Resolver complexity limits |
Frontend: React, Next.js
| Dimension | Fitness function |
|---|---|
| Performance | Bundle size budget per route |
| Structure | Component → Hook → Service — no direct API calls from components |
| Contracts | API types generated from schema, no manual type definitions |
| Maintainability | Component file size limits |
Monorepo with multiple teams:
| Dimension | Fitness function |
|---|---|
| Structure | Package boundary enforcement — import only from public API |
| Structure | No circular dependencies between packages |
| Contracts | Shared type packages as single source of truth |
| Maintainability | Shared lint and TypeScript config enforcement |
The pattern is the same across project types. Identify the critical dimensions. Write checks that protect them. The specific checks change, but the approach stays consistent.
The Ratchet Pattern
Architecture quality should only go in one direction: up.
The pattern is straightforward:
- Clean up a module — fix boundary violations, standardize patterns
- Add the module to an enforcement list
- Violations in that module become errors, not warnings
- New modules are enforced from creation
The enforcement list grows over time. Quality ratchets up. It never goes down.
A common mistake is writing fitness functions for the codebase you want, not the one you have. That breaks everything on day one. The right order: clean up a module first, then add the fitness function to prevent regression. The function protects the work you already did — it’s a lock, not a goal.
In practice, this works well with architecture migrations. A team migrating a legacy codebase doesn’t need to fix everything before enforcing anything. Migrated modules go on an allowlist — violations are errors. Unmigrated modules get warnings. As the migration progresses, the allowlist grows. The architecture improves incrementally, and every step is protected.
This avoids two failure modes. Enforcing everything at once breaks the existing codebase. Enforcing nothing lets migration work regress immediately. The ratchet is the middle path — protect what is already clean, improve the rest gradually.
Scope and Timing
Fitness functions also vary by what they check and when they check it.
Scope matters because some problems are local and some are systemic:
- Atomic — a single check on a single property. “This file has a lint error.” “This function exceeds complexity limits.” Fast, focused, catches violations as they happen.
- Holistic — combines multiple signals across the system. “All module boundaries respected AND tests pass AND contracts valid.” Catches problems that only emerge when components interact.
Timing matters because different problems appear at different speeds:
| Timing | When it runs | What it catches | Example |
|---|---|---|---|
| Triggered | On file edit, commit, or PR | Individual violations as they happen | Boundary check after every edit. Schema breaking change detector on PR |
| Continuous | Daily or weekly schedule | Gradual decay nobody notices | Dead export scan. Dependency freshness check. Convention drift report |
| Temporal | Pre-release or quarterly | Accumulated risk at scale | Security audit before launch. Architecture review when the team doubles |
A system needs all three. Triggered checks catch mistakes as they happen. Continuous checks catch the slow drift between mistakes. Temporal checks catch the risk that accumulates across both.
Evolve With Your Constraints
The right fitness functions depend on the project phase.
Early stage — few rules, flexible patterns. The constraint is speed. Fitness functions would slow down exploration for problems you don’t have yet.
Growth stage — more rules, stricter boundaries. The constraint is consistency. Multiple engineers, multiple AI agents, all generating code. Without enforcement, patterns diverge.
AI-driven stage — fitness functions on every critical dimension. The constraint is trust. The volume of generated code is too high for manual review to catch structural problems.
The temptation is to build fitness functions for problems you don’t have yet. That’s premature enforcement — the architectural equivalent of premature optimization. The better approach is to identify the current constraint, build the checks that protect against it, and evolve when the constraint changes.
Where to Start
Most codebases already have the raw material. Linters check style. Type checkers catch contract violations. Test suites verify behavior. What’s usually missing is the structural layer — the checks that protect architectural decisions.
A practical starting point:
- Pick your critical dimensions. Look at the table above and ask: “If this dimension degrades, does the system fail in a way that matters?” For most teams, structure and contracts come first.
- Check what’s enforced vs. what’s documented. If a rule exists only in documentation or team knowledge, it’s a candidate for a fitness function. The gap between what’s written and what’s enforced is where decay happens.
- Start with one module. Pick the cleanest module in the codebase. Add boundary enforcement. Make violations errors. That module is now protected. One module is better than zero.
- Ratchet from there. As more modules improve, add them to enforcement. New modules get enforced from creation. The allowlist grows. Quality only goes up.
- Add timing layers. Triggered checks for fast feedback during development. A weekly continuous check for drift (dead code, dependency freshness). A quarterly review for accumulated risk.
The architecture that matters is not the one that never changes. It is the one that changes at the right time, for the right reason — and has fitness functions to make sure it doesn’t change back.