How to Read This Guide
The Claude Certified Architect — Foundations exam covers five domains across 60 scenario-based questions. Four scenarios are drawn at random from a pool of six. You need a 720 out of 1,000 to pass.
This guide breaks down each domain technically: what concepts it covers, what the exam scenarios test, what the architectural patterns look like in practice, and how to study effectively. At the end, I provide a 12-week study roadmap with hour allocations proportional to domain weight.
The domains and their weights:
| Domain | Weight | Core Question |
|---|---|---|
| D1: Agentic Architecture | 27% | How do you design multi-agent systems that are bounded, observable, and production-grade? |
| D2: Tool Design & MCP | 18% | How do you connect Claude to external systems safely and reliably? |
| D3: Claude Code Config | 20% | How do you configure and extend Claude Code for team-level development workflows? |
| D4: Prompt & Structured Output | 20% | How do you design prompts that produce consistent, schema-validated output at scale? |
| D5: Context & Reliability | 15% | How do you build systems that degrade gracefully under real-world conditions? |
Domain 1: Agentic Architecture & Orchestration (27%)
This is the largest domain by a significant margin. Nearly a third of the exam is dedicated to understanding how agentic systems are designed. This weight is deliberate — Anthropic is positioning Claude as an agentic platform, and they want architects who understand what that means in production.
What this domain covers
Agentic architecture is about designing systems where AI models operate as agents — autonomous or semi-autonomous entities that take actions, use tools, and make decisions within defined boundaries. The key word is boundaries. Production agentic systems are not open-ended autonomous agents. They are constrained systems with clear limits on what they can do, how long they can run, and when they must escalate.
Core concepts to master:
- Coordinator-subagent patterns. How to decompose complex tasks across multiple specialized agents. The coordinator manages task allocation, monitors progress, and aggregates results. Subagents handle specific, well-scoped subtasks.
- Bounded agentic loops. Every production agent needs hard limits: maximum iterations, time budgets, token budgets, and explicit termination conditions. An agent without bounds is a liability.
- Escalation flows. When should an agent stop and hand off to a human? When should it escalate to a higher-authority agent? Designing escalation triggers — confidence thresholds, ambiguity detection, safety boundaries — is a core architecture skill.
- Task decomposition. Breaking ambiguous user requests into discrete, verifiable subtasks. Each subtask should have a clear success criterion and a rollback path.
- Narrow write surfaces. Constraining the set of actions an agent can take. An agent that can read a database but not write to it has a narrow write surface. An agent that can draft an email but not send it has a narrow write surface. This is where safety meets architecture.
// Coordinator-subagent pattern (simplified) Coordinator ├── receives user request ├── decomposes into subtasks ├── assigns to specialized subagents ├── monitors progress (timeout: 30s per subtask) ├── aggregates results └── returns synthesized response Subagent: Research ├── scope: read-only access to knowledge base ├── budget: max 3 tool calls ├── output: structured JSON with citations └── escalation: if confidence < 0.7, flag for review Subagent: Action ├── scope: write access to ticketing system only ├── budget: max 1 write operation ├── output: action confirmation with rollback ID └── escalation: if action involves PII, require human approval
Which scenarios test this
Multi-Agent Research System — the most direct test. You will need to design coordinator-subagent patterns for parallel research tasks. Customer Support Resolution Agent — tests escalation flows, tool orchestration, and bounded decision-making within a support workflow.
Build a working multi-agent prototype. Even a minimal one — a coordinator that dispatches two subagents and aggregates their results — forces you to confront the real architecture decisions: how do agents communicate, what happens when a subagent fails, how do you enforce time budgets. Reading about these patterns is not enough; you need to feel the design tensions.
Domain 2: Tool Design & MCP Integration (18%)
The Model Context Protocol is Anthropic's standardized interface for connecting Claude to external systems. If Domain 1 is about how agents think, Domain 2 is about how agents act — how they read from databases, call APIs, write to systems, and interact with the world outside the model.
What this domain covers
- MCP architecture. The client-server model: MCP clients (the AI application) connect to MCP servers (tool providers). Understand the transport layer, the protocol handshake, and how capabilities are negotiated.
- Tool schema design. Defining tool interfaces with clear input parameters, output schemas, and descriptions that help the model understand when and how to use the tool. A well-designed tool schema is self-documenting.
- Permission scoping. What data can a tool access? What side effects can it produce? MCP's permission model lets you constrain tools to specific capabilities — read-only access, scoped write access, or no-access defaults.
- Error handling for tool calls. What happens when a tool times out, returns invalid data, or fails entirely? Production systems need graceful degradation: retry logic, fallback behavior, and clear error propagation.
- Tool composition. How multiple tools work together in a workflow. A research agent might call a search tool, then a database lookup tool, then a formatting tool — each step depending on the output of the previous one.
// MCP Tool Schema Example { "name": "lookup_customer", "description": "Retrieve customer record by ID or email. Returns account status, tier, and recent tickets. Read-only — does not modify customer data.", "input_schema": { "type": "object", "properties": { "customer_id": { "type": "string" }, "email": { "type": "string", "format": "email" } }, "oneOf": [ { "required": ["customer_id"] }, { "required": ["email"] } ] }, "permissions": ["crm:read"] }
Which scenarios test this
Customer Support Resolution Agent — tests MCP tool use in a support context (looking up customers, creating tickets, checking knowledge bases). Developer Productivity with Claude — tests MCP server setup and tool configuration for development workflows.
Build an MCP server that connects to a real data source — even something simple like a SQLite database or a REST API. The Anthropic Academy course on MCP walks through the architecture. The key learning is not the code but the design decisions: how you scope permissions, handle errors, and write descriptions that help Claude use the tool correctly.
Domain 3: Claude Code Configuration & Workflows (20%)
Claude Code is the CLI-based developer interface for Claude. This domain is the most practical on the exam — it tests whether you know how to configure, customize, and extend Claude Code for real development workflows.
What this domain covers
- CLAUDE.md configuration. Project-level instruction files that shape how Claude behaves in your codebase. These define coding standards, project context, file conventions, and workflow rules. There are also user-level and directory-level CLAUDE.md files with different scopes and precedence rules.
- Plan mode. A structured workflow where Claude proposes an implementation plan before writing code. Understand when to use plan mode (complex multi-step changes) and when to skip it (small focused edits).
- Hooks. Shell commands that execute in response to Claude Code events — like pre-commit validation, post-edit formatting, or custom tool triggers. Hooks extend Claude Code's capabilities without modifying its core behavior.
- Permission modes. The security model for tool execution. Understand the spectrum from fully interactive (every action requires approval) to autonomous modes, and when each is appropriate.
- IDE integration. How Claude Code integrates with VS Code and other editors. Understand the extension architecture and how to configure it for team-level consistency.
# Example CLAUDE.md (project-level) ## Project Context This is a TypeScript monorepo using pnpm workspaces. API server in /packages/api, React frontend in /packages/web. ## Coding Standards - Use zod for all runtime validation - All API endpoints must have OpenAPI annotations - Tests use vitest, not jest - Never import from src/internal/* outside its package ## Workflow Rules - Always run pnpm typecheck after editing .ts files - Use plan mode for changes spanning >3 files - Never modify .env or docker-compose.prod.yml ## MCP Servers - Database MCP server on localhost:3100 (read-only) - GitHub MCP server for issue tracking
Which scenarios test this
Code Generation with Claude Code — the primary scenario. Tests CLAUDE.md configuration, plan mode usage, and development workflow design. Claude Code for CI/CD — tests integration into automated pipelines using structured output and batch processing.
Use Claude Code as your daily development tool for at least two weeks before the exam. Write CLAUDE.md files for a real project. Experiment with plan mode on multi-file changes. Set up a hook that runs your linter after edits. The exam scenarios assume hands-on familiarity, not theoretical knowledge.
Domain 4: Prompt Engineering & Structured Output (20%)
This is not the prompt engineering of 2023 — crafting clever instructions to get better chatbot responses. This domain tests production prompt design: building prompt pipelines that produce consistent, parseable, schema-validated output across thousands of invocations in automated systems.
What this domain covers
- System prompts for production. Designing instructions that constrain model behavior reliably. Production system prompts are not suggestions — they are specifications. They define output format, behavioral boundaries, and error handling expectations.
- Structured output schemas. Using JSON mode and tool use to enforce output format. Understand when to use each approach and how they interact with the model's generation behavior.
- Validation-retry patterns. What happens when the model produces output that does not match the expected schema? Production systems need validation layers that catch malformed output and trigger regeneration with corrective guidance.
- Few-shot and chain-of-thought. When to provide examples in the prompt (few-shot) versus when to ask the model to reason step-by-step (chain-of-thought). Each technique has trade-offs in token cost, reliability, and output quality.
- Tool use as structured output. Using tool definitions not just to call external tools but to force the model to produce structured data in a predictable format. This is a production pattern that many architects miss.
// Validation-retry pattern for structured extraction Step 1: Extract prompt → model → raw output Step 2: Validate raw output → schema validator ├── valid → return parsed data └── invalid → go to Step 3 Step 3: Retry with correction original prompt + raw output (as context) + validation errors (specific fields) + corrective instruction → model → corrected output → schema validator ├── valid → return parsed data └── invalid → max 2 retries, then fail gracefully
Which scenarios test this
Structured Data Extraction — the most direct test. Designing extraction pipelines with schemas, tool use, and validation-retry loops. Code Generation with Claude Code — tests structured output in the context of code generation workflows.
Build a data extraction pipeline that takes unstructured text and produces validated JSON. Use tool definitions to enforce output schema. Implement a retry loop that catches validation errors and re-prompts with specific correction instructions. This single exercise touches every concept in this domain.
Domain 5: Context Management & Reliability (15%)
The smallest domain by weight, but arguably the most important for production systems. Every scenario on the exam implicitly tests these concepts because every production system encounters context limits, API failures, and rate limits. This domain is the difference between a system that works in demos and a system that works under load.
What this domain covers
- Context window management. Claude has a large but finite context window. In long-running agent sessions, conversations accumulate tokens. Understanding how to prioritize what stays in context — and what gets compressed or dropped — is a critical production skill.
- Token budgets. Allocating token budgets across system prompts, user context, tool results, and model output. A production system that uses 80% of the context window for the system prompt leaves too little room for the actual conversation.
- Prompt caching. Anthropic's prompt caching reduces costs and latency for prompts with stable prefixes. Understand when caching applies, how to structure prompts to maximize cache hits, and the cost implications.
- Rate limiting and retry logic. Handling API rate limits gracefully: exponential backoff, jitter, queue management. A production system that crashes on a 429 response is not production-grade.
- Circuit breakers. Detecting when a downstream service (API, tool, database) is failing consistently and stopping requests to it temporarily. This prevents cascading failures in multi-step agent workflows.
- Graceful degradation. What does the system do when it cannot complete the full request? Returning partial results, falling back to simpler models, or queuing for later processing — all are valid degradation strategies depending on the use case.
// Context budget allocation for a support agent Total context window: 200K tokens System prompt: ~2,000 tokens (1%) Tool definitions: ~3,000 tokens (1.5%) Customer context: ~5,000 tokens (2.5%) Conversation history: ~40,000 tokens (20%) Tool call results: ~30,000 tokens (15%) Available for output: ~120,000 tokens (60%) // Compression triggers if conversation_tokens > 40,000: compress older turns (summarize, keep last 5 turns verbatim) if tool_results_tokens > 30,000: truncate large tool results, keep structure + key fields
Which scenarios test this
All six scenarios implicitly test reliability concepts. The ones that test it most directly: Claude Code for CI/CD (batch processing, structured output at scale) and Multi-Agent Research System (context management across parallel agents).
Build a wrapper around the Claude API that implements exponential backoff with jitter, context window monitoring, and a circuit breaker for tool calls. This is a small project (~200 lines) that teaches you more about production reliability than any documentation can. Then stress-test it: what happens when you send 100 concurrent requests? When a tool server goes down?
The 12-Week Study Roadmap
The recommended total study time is approximately 85 hours over 12 weeks. Here is how to allocate that time:
| Week | Focus | Hours | Activities |
|---|---|---|---|
| 1–2 | Foundations | ~14h | Complete Claude 101 and Building with the Claude API courses on Anthropic Academy. Install and set up Claude Code. |
| 3–4 | D1: Agentic Architecture | ~14h | Study coordinator-subagent patterns. Build a two-agent system with task decomposition and escalation logic. |
| 5 | D1: Agentic Architecture | ~9h | Add bounded loops, rollback discipline, and narrow write surfaces to your prototype. Study multi-agent research patterns. |
| 6 | D2: Tool Design & MCP | ~8h | Complete the MCP course on Anthropic Academy. Build an MCP server connecting to a database or API. |
| 7 | D2: Tool Design & MCP | ~7h | Practice tool schema design. Add permission scoping and error handling to your MCP server. |
| 8 | D3: Claude Code Config | ~8h | Complete the Claude Code in Action course. Write CLAUDE.md for a real project. Set up hooks and plan mode. |
| 9 | D3: Claude Code + D4: Prompts | ~9h | CI/CD integration patterns for Claude Code. Start building structured output pipelines with schema validation. |
| 10 | D4: Prompt Engineering | ~8h | Build a data extraction pipeline with validation-retry loops. Practice tool-use-as-structured-output patterns. |
| 11 | D5: Context & Reliability | ~8h | Build an API wrapper with rate limiting, context monitoring, and circuit breakers. Study caching strategies. |
| 12 | Practice & Review | ~5h | Walk through all 6 exam scenarios. Take practice exams. Review weak areas by domain. |
The build-first approach
Notice that every phase involves building something. This is deliberate. The exam is scenario-based — it presents production architecture problems and asks you to choose the right design. You cannot answer these questions from reading alone. You need the muscle memory of having designed, built, and debugged real systems using Claude's architecture stack.
Minimum builds for exam readiness:
- A multi-agent system with coordinator, subagents, bounded loops, and escalation (D1)
- An MCP server connecting to a real data source with permission scoping (D2)
- A CLAUDE.md configuration for a real project with hooks and plan mode (D3)
- A structured extraction pipeline with schema validation and retry logic (D4)
- An API wrapper with rate limiting, caching, and circuit breakers (D5)
Five projects. Each one small enough to build in a few hours. Together they cover every domain on the exam.
Free Resources
Anthropic Academy on Skilljar offers 13 free courses. The ones most relevant to exam preparation:
- Claude 101 — foundational concepts, capabilities, and mental model for how Claude works
- Building with the Claude API — 8+ hours covering system prompts, tool use, context windows, and architecture patterns
- Claude Code in Action — practical configuration, workflow design, and daily usage
- Introduction to Model Context Protocol — MCP architecture, building servers and clients, connecting to external systems
- Claude with Amazon Bedrock / Google Vertex AI — cloud deployment patterns
All courses are free. No paid subscription required. Each awards a certificate of completion for your LinkedIn profile. Access them at anthropic.skilljar.com.
Connecting the Domains
The five domains are not isolated silos. In production systems, they interact continuously. An agentic architecture (D1) uses MCP tools (D2) that are configured through Claude Code (D3) with carefully designed prompts (D4) and reliability engineering (D5) holding everything together.
The exam scenarios test this integration. A Customer Support Resolution Agent question might test your understanding of agentic escalation (D1), tool permission scoping (D2), and retry logic for failed tool calls (D5) — all in a single scenario. The strongest exam preparation is building integrated systems that exercise multiple domains simultaneously.
If you study the domains in isolation, you will understand each one. If you build systems that connect them, you will understand how they work together. The exam rewards the latter.
Stay in the loop
New posts on AI agents, architectures, and applied research — delivered weekly.