AI Agent Design Patterns and Best Practices

Principles

First-Principles

From Lee Sedol vs. AlphaGo's Move 37, we can summarize first-principles agent principles:

Replica agents: Use biomimicry when workflows require human review, agents serve as copilots, or integrating with legacy UI-only tools
Alien agents: Use first-principles when the goal is pure result efficiency

Asymmetry of Verification and Verifiers

Asymmetry of verification and verifiers law:

All solvable and easily verifiable problems will be solved by AI.

Agent Traffic

Among the agents:

Value of highly polished UI and enterprise applications will decrease, value of performant, reliable, extensible API will increase.

Patterns

Agent design patterns:

Give agents a computer (CLI and files)
Progressive disclosure
Offload context
Cache context
Isolate context
Evolve context

Agent-native Architecture

Agent-native apps should:

Parity: Users complete tasks via UI <-> Agents implement via tools
Granularity: Tools should be atomic primitives
Composability: With above two, just write new prompts to create new features
Emergent capability
Files as universal interface: Files for legibility, databases for structure
Improvement over time:
- Accumulated context: State persists across sessions
- Developer-level refinement: System prompts
- User-level customization: User prompts

Agent-native Product

Build capable foundation, observe what users ask agent to do, formalize patterns that emerge:

Common patterns: Domain tools
Frequent requests: Dedicated prompts
Unused tools: Remove

Recursive Language Models

RLM achieves multi-hop reasoning code through divide-and-conquer and recursion, solving the Context Rot problem caused by long text.

Instructions

Use existing documents: Use existing operating procedures, support scripts, or policy documents to create LLM-friendly routines
Prompt agents to break down tasks: Providing smaller, clearer steps helps minimize ambiguity and helps models better follow instructions
Define clear actions: Ensure each step in the routine corresponds to a specific action or output
Capture edge cases: Real interactions often produce decision points, and a robust routine predicts common variations and includes instructions on how to handle them through conditional steps or branches, e.g., providing alternative steps when required information is missing

How to write a great AGENTS.md lessons from over 2500 repositories:

States a clear role: Defines who the agent is (expert technical writer), what skills it has (Markdown, TypeScript), and what it does (read code, write docs)
Executable commands: Gives AI tools it can run (npm run docs:build and npx markdownlint docs/). Commands come first
Project knowledge: Specifies tech stack with versions (React 18, TypeScript, Vite, Tailwind CSS) and exact file locations
Real examples: Shows what good output looks like with actual code. No abstract descriptions
Three-tier boundaries: Set clear rules using always do, ask first, never do. Prevents destructive mistakes

Tip

Role -> Tool -> Context -> Example -> Boundary

Vibe Coding

Spec the work:
- Goal: Picking next highest-leverage goal
- Breakdown: Breaking work into small and verifiable slice (pull request)
- Criteria: Writing acceptance criteria, e.g., inputs, outputs, edge cases, UX constraints
- Risk: Calling out risks up front, e.g., performance hot-spots, security boundaries, migration concerns
Give agents context:
- Repository: Repository conventions
- Components: Component system, design tokens and patterns
- Constraints: Defining constraints: what not to touch, what must stay backward compatible
Direct agents what, not how:
- Tools: Assigning right tools
- Files: Pointing relevant files and components
- Constraints: Stating explicit guardrails, e.g., don't change API shape, keep this behavior, no new deps
Verification and code review:
- Correctness: Edge cases, race conditions, error handling
- Performance: N+1 queries, unnecessary re-renders, overfetching
- Security: Auth boundaries, injection, secrets, SSRF
- Tests: Coverage for changed behaviors
Integrate and ship:
- Break big work into tasks agents can complete reliably
- Merge conflicts
- Verify CI
- Stage roll-outs
- Monitor regressions

Tip

Spec → Onboard → Direct → Verify → Integrate

System

OpenAI Codex system prompts:

Instructions
Git instructions
AGENTS.md spec
Citations instructions

Coding

Writing good AGENTS.md:

AGENTS.md should define your project's WHY, WHAT, and HOW
Less is more: Include as few instructions as reasonably possible in the file
Keep the contents of your AGENTS.md concise and universally applicable
Use Progressive Disclosure: Don't tell Agent all the information to know, tell Agent when it needs, how to find and use it
Agent is not a linter: Use linters and code formatters, and use other features like Hooks and Slash Commands
AGENTS.md is the highest leverage point of the harness, so avoid auto-generating it. You should carefully craft its contents for best results

Pull Request

GitHub Copilot to debug issues faster:

Testing

Research

AI agents powered by tricky LLMs prompting:

Deep research agent from Claude agents cookbook
DeepCode: Open agentic coding
Generative agent
Minecraft agent

Tool

Tool execution:

Tool calling: Atomic toolkit
Bash: Composable static scripts
Codegen: Dynamic programs

Context

Dynamic Discovery

Dynamic context discovery:

Tool response → File
Terminal session → File
Reference conversation history when context compression
Load on demand
Progressive disclosure

Personalization

Metaprompting for memory extraction:

Memory System

Memory system:

Repeatable memory loop: Inject → Reason → Distill → Consolidate
Enforce precedence: Current user message > Session context > Memory

Context Engineering

LLMs do not uniformly utilize their context, their accuracy and reliability decline as the number of input tokens increases, called Context Rot.

Therefore, merely having relevant information in the model's context is insufficient: the presentation of information significantly impacts performance. This highlights the necessity of context engineering to optimize the amount of relevant information and minimize irrelevant context for reliable performance, e.g., custom Gemini CLI command.

Planning with Files

Manus:

Design around KV-cache
Plan is required
Files are memory
Don't get few-shotted: Get rid of repetitive actions
Manipulate attention through recitation

Workflow

Plan Mode

Claude code EnterPlanMode system prompt:

Debug Mode

Cursor debug mode:

Assume: Generate multiple hypotheses
Log: Add logging points
Collect: Collect runtime data (log, trace, profile)
Locate: Reproduce bug, analyze actual behavior, precisely locate root cause
Fix: Based on evidence, make targeted fixes

TDD

Test-driven development:

Write tests: Have the agent write tests based on expected input/output pairs. Clearly state you're doing TDD, to avoid agent writing mock implementations for features that don't exist yet
Run tests: Have the agent run tests and confirm tests actually fail. Clearly state not to write implementation code at this stage
Commit tests
Write code: Have the agent write code to pass tests, and instruct it not to modify tests. Tell it to iterate until all tests pass
Submit code

Orchestration

Single-agent Systems

Multi-agent Systems: Manager Pattern

Other agents act as tools, called by the central agent:

Multi-agent Systems: Decentralized Pattern

Multiple agents run as peers:

Guardrails

Building Guardrails

Relevance classifier: Ensures agent responses stay within expected scope by flagging off-topic queries
Safety classifier: Detects unsafe inputs attempting to exploit the system (jailbreaks or prompt injection)
PII filter: Prevents unnecessary personal identity information leakage by reviewing model output for any potential PII
Content moderation: Flags harmful or inappropriate inputs (hate speech, harassment, violence) to maintain safe, respectful interactions
Tool safety measures: Evaluate risk of each tool available to your agent by assigning low, medium, or high ratings based on factors like read-only vs. write access, reversibility, required account permissions, and financial impact. Use these risk ratings to trigger automated actions like pausing for guardrail checks before high-risk feature execution, or escalating to human intervention when needed
Rule-based protection: Simple deterministic measures (blacklists, input length limits, regex filters) to prevent known threats like prohibited terms or SQL injection
Output validation: Ensure responses align with brand values through prompt engineering and content checks, preventing outputs that could damage brand integrity

Triggering a human intervention plan when exceeding failure thresholds or high-risk operations is a critical safety measure.

Evaluation

Agents eval:

Start early
Source realistic tasks from failures
Define unambiguous, robust success criteria
Design graders thoughtfully and combine multiple types (code-based, model-based, human)
Make sure the problems are hard enough for model
Iterate on evaluations to improve signal-to-noise ratio
Read transcripts
Pick framework: prompt foo, harbor

When building agents, trace is the source of truth:

Debugging becomes trace analysis
Testing becomes eval-driven
Can't set breakpoints in reasoning
Performance optimization changes: task success rate, reasoning quality, tool usage efficiency

Benchmarks

Benchmarks:

Aggregate: Don't obsess over a 1-2% lead on one benchmark, focus on specific and comprehensive domain
Relative: Compare within the same model family or lab, how did the score change from v1 to v2?
Verify: The only benchmark that matters at the end of the day is your workload

Libraries

Instruction

AGENTS.md: Open format for guiding coding agents
llms.txt: Helping language models use website
System: System prompts for AI agents

RAG

RAGFlow: Superior context layer for AI agents

Project

VibeKanban: Run coding agents in parallel without conflicts, and perform code review

Documentation

DeepWiki
ZRead: AI-powered GitHub repository reader

Slide

Banana: AI-native PPT generator based on nano banana pro

References

Vibe coding prompts
Vibe coding guide
Agent system prompts