Technical Design Document

Agent Prompt Snippet

Ensure the project has a technical design document defining the engine architecture, core systems, implementation patterns, and platform-specific choices.

Purpose

A Technical Design Document (TDD) is the implementation blueprint for a system or major component. Where the Architecture Overview describes what exists and how pieces connect, the TDD describes how a specific piece works inside: data structures, algorithms, concurrency model, error handling strategy, performance constraints, and the critical implementation choices that the team must align on before writing code.

TDDs are written before implementation begins. They are the engineering equivalent of blueprints: investing one to three days in a TDD prevents weeks of implementation work going in the wrong direction. The most valuable TDDs are written for the parts of the system that are either technically risky (novel algorithms, tight performance constraints, tricky concurrency), structurally significant (decisions that are hard to reverse after hundreds of files depend on them), or cross-cutting (patterns that must be consistent across many modules).

Not everything needs a TDD. A single-endpoint CRUD service with no complexity does not warrant one. A rendering engine, a distributed consensus mechanism, or a real-time sync protocol absolutely does. Exercise judgment: if you cannot explain the implementation to another engineer in a 30-minute conversation without a whiteboard, write a TDD.

The TDD is a living document during the design phase, then becomes a reference document after implementation. It should be updated when significant implementation decisions deviate from the design—but it is not a commit log. Once the feature ships, the TDD captures design intent; the code captures the actual implementation.

Who needs this document

Persona	Why they need it	How they use it
Sam (Indie Dev)	Forces thinking through hard problems before they become expensive bugs	Writes TDD for technically risky components; uses it as a checklist during implementation
Claude Code (AI Agent)	Needs implementation intent to generate code that fits the design, not just code that compiles	Reads TDD before implementing any complex component; aligns generated code with specified data structures and algorithms
Priya (Eng Lead)	Reviews design before significant engineering investment; catches architectural mistakes early	Requires TDD for any component with non-trivial complexity before implementation begins
DevOps (CI Operator)	Needs to understand operational characteristics: startup time, memory footprint, failure modes	Reads TDD’s operational sections when writing Dockerfile, health checks, and runbook entries

What separates a good version from a bad one

Criterion 1: Data structures are specified, not assumed

✓ Strong: “The event queue uses a bounded MPSC ring buffer of capacity 65,536 entries, each 64 bytes (cache-line aligned). Producers acquire slots via atomic compare-and-swap. Consumers drain in batches of up to 256 entries. Full queue behavior: producer blocks for up to 5ms, then drops the event and increments queue_overflow_total.”

✗ Weak: “Events will be stored in a queue. We’ll use something efficient.” (No capacity, no concurrency model, no overflow behavior—three unknowns that will each require a separate design decision later, probably under production pressure.)

Criterion 2: Error handling strategy is explicit

✓ Strong: “Parse errors in incoming messages return {"error": "parse_error", "detail": "<field>: <reason>"} with HTTP 422. Database transient errors trigger exponential backoff (100ms base, 2x multiplier, 5 retries max) before returning HTTP 503. Unrecoverable errors log at ERROR level with full context and return HTTP 500 with a correlation ID. The correlation ID is always present in responses—even for 200s—to enable distributed trace lookup.”

✗ Weak: “Errors are handled appropriately.” (Three categories of error in that one word “appropriately”—not a spec, not actionable.)

Criterion 3: Performance constraints are quantified

✓ Strong: “The renderer must produce frames at ≥ 60fps on the minimum spec target (Intel i5-8xxx, 8 GB RAM, integrated graphics). The main thread budget is 8.3ms per frame. Physics simulation runs on a dedicated thread with a 5ms budget. Rendering commands are generated on the main thread and dispatched asynchronously to the GPU command buffer.”

✗ Weak: “The system should be fast enough for real-time use.” (No target hardware, no frame budget, no allocation of time between subsystems—insufficient to validate implementation choices.)

Criterion 4: Dependencies and their failure modes are documented

✓ Strong: “External dependencies: (1) Redis for session cache—if Redis is unreachable, fall back to database session lookup (2x latency penalty accepted). (2) SendGrid for email—if unreachable, enqueue email in DLQ and retry with exponential backoff for up to 24 hours. (3) Stripe for payments—if unreachable, return 503 to the caller; do not queue or retry payment requests.”

✗ Weak: “The system depends on Redis and Stripe.” (Which path fails how? Without failure mode analysis, the implementation team will make inconsistent decisions independently.)

Common mistakes

Writing the TDD after the code is already written. A post-implementation TDD is a code description, not a design. It cannot catch flawed assumptions before they are baked in. TDDs that predate the implementation are decision aids; TDDs written after are documentation overhead.

Specifying what, not why. Listing data structures without explaining why that structure was chosen over alternatives loses the most valuable part of the document. The person implementing it—or the person modifying it six months later—needs to understand the constraints that shaped the choice.

Too much scope in one TDD. A TDD for “the entire backend” cannot be reviewed meaningfully in a single session. One TDD per subsystem or major component is the right granularity. Split large TDDs at natural boundaries (one per service, one per major algorithm, one per protocol).

Ignoring the operational dimension. Implementation-focused TDDs that say nothing about startup sequence, graceful shutdown, health check semantics, or resource limits will produce services that are hard to operate. The TDD is the right place to specify SIGTERM handling and readiness probe logic.

How to use this document

When to create it

Write the TDD before the implementation PR opens. For complex features, the TDD review is the design review. Plan 1–3 days for writing a TDD for a significant component; plan 2–4 hours of review time. The TDD becomes stale after implementation—update it for significant deviations; don’t try to keep it in perfect sync with every code change.

Who owns it

The engineer implementing the component owns the TDD. The tech lead or architect is the primary reviewer. For cross-team components, affected teams review the TDD before implementation begins.

How AI agents should reference it

get_standard_docs(type="video_game", features=[])
→ tdd in documents[]
→ agent reads the TDD before implementing any complex subsystem
→ agent aligns generated data structures and algorithms with TDD specifications
→ agent flags deviations from the TDD design in implementation PR descriptions

The prompt_snippet — “Ensure the project has a technical design document defining the engine architecture, core systems, implementation patterns, and platform-specific choices” — tells the agent to verify the TDD covers the full scope of the system’s non-trivial components.

How it connects to other documents

The TDD is downstream of the Requirements Specification (requirements set the what; the TDD specifies the how) and upstream of the code. It should reference ADR numbers for each significant design choice it makes. The API Contract specifies the external interface; the TDD specifies the implementation behind it. The Architecture Overview provides the container boundary within which the TDD operates.