I’ve run hundreds of Claude Code sessions over the past few months. I’ve watched it refactor entire service layers, hit context limits mid-task and recover, spawn sub-agents to explore codebases, and occasionally do something so clever it made me stop and think about what just happened under the hood.
Most developers use Claude Code like a black box. Type a prompt, get code back. But understanding what actually happens between your keystroke and the final code change makes you dramatically better at using it. You learn when to split tasks, when to /compact, when to use Plan Mode, and why some prompts cost $0.50 while others cost $8.
In this article, I’ll trace the complete lifecycle of a Claude Code session, from the moment you type claude in your terminal to the final tool call. I’ve measured real token budgets, traced the agentic loop, and mapped out every system that fires along the way. Let’s get into it.
The 10-Second Version
Claude Code is an agentic harness around Claude. It provides tools, context management, and an execution environment that turns a language model into a coding agent. When you type a prompt, Claude Code assembles a system prompt (~2,900 base tokens), loads your CLAUDE.md instructions, injects 18+ tool definitions, and sends everything to the Claude API. The model responds with either text (done) or tool calls (keep going). This loop - call model, execute tools, feed results back, call model again - repeats until the model produces a text-only response with no tool invocations. A typical task runs 5-50 iterations of this loop.
That’s the core architecture. Everything else - sub-agents, context compaction, hooks, permissions - is infrastructure that makes this loop reliable, safe, and cost-effective. Let’s break it all down.
What Loads Before You Type a Single Character
When you run claude in a project directory, a lot happens before you see the prompt. Here’s the startup sequence, in order:
- Managed policy settings load first (organization-level, cannot be overridden)
- User settings from
~/.claude/settings.json - Project settings from
.claude/settings.jsonand.claude/settings.local.json - CLAUDE.md files - Claude Code walks UP the directory tree from your current directory, loading every CLAUDE.md it finds. Project-level, user-level, even organization-level if configured.
- Rules from
.claude/rules/*.md- unconditional ones load immediately, path-scoped ones load on-demand when Claude reads matching files - Auto-memory -
MEMORY.md(first 200 lines) from~/.claude/projects/<project>/memory/ - Skills discovery - scans
~/.claude/skills/and.claude/skills/for available slash commands - MCP servers connect
- Git status injection - current branch, uncommitted changes, recent commit history
- System prompt assembly - all of the above gets compiled into the final prompt
Here’s the part that surprised me: the system prompt isn’t a single monolithic string. Claude Code assembles dozens of conditional prompt strings based on your environment, settings, and context. The base system role definition alone is ~2,900 tokens. Add tool definitions (~3,000 tokens for 18+ tools), CLAUDE.md content (500-2,000 tokens typically), git status, and skill descriptions - and you’ve spent 8,000-12,000 tokens before typing a single character.
That’s 4-6% of a 200K context window. On the extended 1M context window available with Opus 4.6, it’s negligible. On the standard 200K, it means you have ~190K tokens for actual conversation and tool results.
A critical detail: CLAUDE.md content is injected as a user message, not as part of the system prompt. This is a deliberate design choice. It means CLAUDE.md instructions are treated as context rather than hard configuration. More specific, concise instructions produce better adherence. Keep your CLAUDE.md under 200 lines.
The Agentic Loop - The Heart of Claude Code
This is where the magic happens. The core of Claude Code is a deceptively simple while loop (documented in Anthropic’s official architecture overview):
while (response contains tool_calls): execute tool(s) feed results back as tool_result messages call model again with updated conversation
when response is plain text (no tool calls): loop terminates, return to userThat’s it. No complex multi-agent orchestration, no swarm intelligence, no parallel reasoning threads. A single-threaded master loop that calls the model, executes tools, and feeds results back until the model decides it’s done.
The Three Phases
Every iteration of the loop falls into one of three phases, though Claude blends them fluidly:
- Gather context - Read files, search code, explore the codebase
- Take action - Edit files, run commands, create new files
- Verify results - Run tests, check builds, read the output
Claude decides what each step requires based on what it learned from the previous step. There’s no rigid state machine forcing it through phases. The model’s reasoning drives the flow.
The .NET Middleware Pipeline Analogy
If you’re a .NET developer, think of it like the ASP.NET Core middleware pipeline - but in reverse. In ASP.NET Core, a request flows through middleware layers, each one potentially short-circuiting or modifying the request/response. In Claude Code, each loop iteration is like a request through the pipeline:
- PreToolUse hooks fire (like request middleware - can allow, deny, or modify)
- Permission check runs (like authorization middleware)
- Tool executes (like your endpoint handler)
- PostToolUse hooks fire (like response middleware - can provide feedback)
- System reminders inject (like logging middleware adding context)
- Model receives everything and decides the next action
The key difference: in ASP.NET Core, the pipeline processes one request. In Claude Code, the pipeline loops until the model produces a text-only response. Each iteration builds on everything that came before.
What Stops the Loop?
The loop terminates when:
- The model produces plain text without any tool invocations (most common)
- You interrupt by pressing Escape
- Context window exhaustion triggers auto-compaction
maxTurnslimit on sub-agents prevents runaway loops- A Stop hook with
decision: "block"prevents termination (yes, hooks can force Claude to keep going)
The Async Queue
There’s one more piece: a dual-buffer async queue that enables pause/resume and mid-task user interjections. This is what lets you press Escape while Claude is mid-edit, type a correction, and have Claude adjust without restarting the entire task. It’s a small but critical detail that makes Claude Code feel interactive rather than batch-processed.
The 18+ Built-In Tools - How Claude Picks the Right One
Claude Code ships with 18+ built-in tools, each with a JSON schema description embedded in the system prompt. The model selects tools based on the user’s intent, previous tool results, and explicit instructions in the tool descriptions.
Tool Categories
File Operations:
- Read - Read files (default ~2,000 lines, supports images, PDFs, Jupyter notebooks)
- Write - Create or overwrite entire files
- Edit - Surgical string replacement with uniqueness check (preferred over Write for modifications)
Search:
- Glob - Fast file pattern matching, sorted by modification time
- Grep - Ripgrep-powered regex search with multiple output modes
Execution:
- Bash - Persistent shell with risk classification, injection filtering, timeout support
Web:
- WebFetch - URL retrieval with AI summarization and 15-minute cache
- WebSearch - Web search with domain filtering
Orchestration:
- Agent - Spawn sub-agents with isolated context
- Skill - Invoke skills (custom slash commands) by name
- NotebookEdit - Jupyter notebook cell editing
Planning and Communication:
- TodoWrite - Structured task lists with priorities
- AskUserQuestion - Ask the user for input
- SendMessage - Inter-agent communication
How Claude Decides Which Tool to Use
This isn’t random. The system prompt contains explicit routing instructions:
- “Use Read instead of
cat,head,tail, orsed” - “Use Grep instead of
greporrgvia Bash” - “Use Glob instead of
findorls” - “For broader codebase exploration, use Agent with
subagent_type=Explore” - “For simple, directed searches, use Glob or Grep directly”
There’s a hierarchy: dedicated tools are always preferred over Bash equivalents. This is because dedicated tools give the user visibility into what Claude is doing - a Grep call is transparent, while rg pattern inside Bash is opaque.
My take: Understanding this hierarchy is one of the biggest productivity wins. When Claude uses the wrong tool (like Bash for file reading), the system prompt is literally telling it not to. If your CLAUDE.md reinforces the right tool choices for your project (“always use dotnet test for testing, never dotnet run on test projects”), Claude follows those instructions within the agentic loop.
Sub-Agents - When Claude Calls for Backup
Sometimes a single thread isn’t enough. Claude Code can spawn sub-agents: isolated instances that run in their own context window with their own system prompt.
Built-In Sub-Agent Types
| Agent Type | Model | Tools | Purpose |
|---|---|---|---|
| Explore | Haiku (fast) | Read-only | File discovery, codebase exploration |
| Plan | Inherits | Read-only | Research for plan mode decisions |
| general-purpose | Inherits | All tools | Complex multi-step tasks |
How Context Forking Works
When Claude spawns a sub-agent, it does NOT copy the entire conversation history. The sub-agent receives:
- Its own system prompt (much smaller than the main one)
- Basic environment details (working directory, platform)
- The task description from the parent
- CLAUDE.md content (still loaded)
The sub-agent runs its own agentic loop, potentially making dozens of tool calls. When it finishes, only a condensed summary returns to the parent - typically 1,000-2,000 tokens from what might have been tens of thousands of tokens of exploration.
This is the key insight: sub-agents are context-efficient. They let Claude explore deeply without filling the parent’s context window. An Explore sub-agent might read 50 files (consuming 100K+ tokens internally) but return a 1,500-token summary to the parent.
The Critical Constraint
Sub-agents cannot spawn other sub-agents. There’s a hard depth limit of 1 level. This prevents recursive explosion and keeps the system debuggable. Anthropic explicitly chose this over multi-agent swarms: “a simple, single-threaded master loop combined with disciplined tools delivers controllable autonomy.”
When to Use Sub-Agents
Here’s my decision matrix from real usage:
| Scenario | Use Sub-Agent? | Which Type? |
|---|---|---|
| Find a specific file/function | No - use Glob/Grep directly | - |
| Understand a module’s architecture | Yes | Explore |
| Run tests independently | Yes | general-purpose |
| Research before a complex refactor | Yes | Explore or Plan |
| Parallel independent tasks | Yes (background) | general-purpose |
| Simple file edits | No - do it inline | - |
Context Window - The 200K Token Budget
Every turn in a Claude Code session reprices the entire context. The model processes ALL accumulated input tokens on each interaction. This means your 50th prompt is dramatically more expensive than your 1st, because the model reads the entire conversation history every time.
Real Token Budget Breakdown
Here’s what I’ve measured from my own sessions on a standard 200K context:
| Component | Tokens | % of 200K |
|---|---|---|
| Base system prompt | ~2,900 | 1.5% |
| 18+ tool definitions | ~3,000 | 1.5% |
| CLAUDE.md (typical project) | ~1,200 | 0.6% |
| Rules files | ~500 | 0.3% |
| Git status | ~300 | 0.2% |
| Skill descriptions | ~800 | 0.4% |
| Total overhead | ~8,700 | ~4.5% |
| Available for conversation | ~191,000 | ~95.5% |
That looks comfortable (see Claude Code costs documentation for official pricing and token details). But here’s where it gets real: a single file read can return 2,000+ lines (easily 8,000-15,000 tokens). A Bash command output can be huge. After 10-15 tool calls, you’ve consumed 50-80K tokens. By mid-session, the overhead isn’t the system prompt - it’s the accumulated tool results.
The 3-Tier Compaction Strategy
When you approach ~92-95% context capacity, Claude Code triggers automatic compaction (Anthropic’s engineering blog has a deep dive on context engineering that explains the philosophy). It uses three tiers:
Tier 1 - Tool result clearing: Removes raw tool outputs from deep in message history. Lightest touch, safest option.
Tier 2 - Conversation summarization: The compactor passes the full history to the model for summarization. It preserves architectural decisions, unresolved bugs, implementation details, and key decisions. It discards redundant tool outputs, verbose intermediate results, and detailed instructions from early in the conversation. Typical compression: 60-80%.
Tier 3 - Manual compaction: /compact [focus] lets you direct what to preserve. You can say /compact focus on the auth refactor and the compactor will prioritize auth-related context.
What Survives Compaction
This is the part nobody explains clearly. Here’s what I’ve observed:
Always survives:
- CLAUDE.md (re-read from disk after compaction, guaranteed fresh)
- Current file states (Claude re-reads if needed)
- The most recent tool results
- Active TODO list items
Usually survives (summarized):
- Architectural decisions from earlier in the session
- Key code patterns and approaches chosen
- Error messages and their resolutions
Gets discarded:
- Early exploratory file reads (the raw content, not the insight)
- Verbose command outputs from successful operations
- Superseded approaches that were abandoned
My take: Knowing this changes how I work. For long refactoring sessions, I run /compact focus on [the key changes] proactively at around 60-70% capacity rather than waiting for auto-compaction at 92%. This gives me control over what survives instead of leaving it to the summarizer.
The Permission and Safety Model
Claude Code doesn’t just execute whatever the model asks. Every tool call goes through a permission pipeline.
Rule Evaluation: deny > ask > allow
Permission rules are evaluated in strict order:
- Deny rules - checked first. If any deny rule matches, the tool call is blocked. Period.
- Ask rules - checked second. If matched, the user is prompted for approval.
- Allow rules - checked last. If matched, the tool call proceeds silently.
First match wins. This means a deny rule always overrides an allow rule, making the system fail-safe.
Permission Modes
| Mode | Behavior |
|---|---|
default | Prompts for permission on first use of each tool |
acceptEdits | Auto-accepts file edits, still asks for commands |
plan | Read-only tools only (this is Plan Mode) |
bypassPermissions | Skips prompts (except writes to .git, .claude, .vscode) |
You cycle through modes with Shift+Tab during a session.
Hooks - Intercepting the Loop
Hooks are shell commands or HTTP endpoints that fire at lifecycle events (see the full hooks reference for all 12+ event types). They intercept the agentic loop at specific points:
- PreToolUse - fires BEFORE a tool executes. Can allow, deny, or modify the tool’s input.
- PostToolUse - fires AFTER successful execution. Can provide corrective feedback.
- Stop - fires when Claude tries to finish. Can force Claude to keep going (quality gates).
- SessionStart/SessionEnd - lifecycle bookends.
The most powerful pattern I’ve found: a PreToolUse hook that runs dotnet format before every Edit tool call, ensuring code style is consistent. And a Stop hook that runs dotnet test and blocks stopping if tests fail.
Extended Thinking and Prompt Caching
Two features that significantly affect session cost and quality:
Extended Thinking
Enabled by default. Claude Code allocates thinking tokens based on task complexity (adaptive reasoning). Thinking tokens are billed as output tokens but dramatically improve planning quality.
Control it with effort levels:
- low - faster, cheaper, straightforward tasks
- medium - default for Opus 4.6
- high - deeper reasoning for complex tasks
- max - no token spending constraint (Opus 4.6 only)
Prompt Caching
Claude Code automatically caches system prompts and tool definitions across requests within a session. The first request pays full input token cost. Subsequent requests with the same prefix get significantly cheaper cache-read pricing.
This is why long sessions with consistent system prompts are more cost-efficient per-turn than many short sessions. The system prompt (~8,700 tokens) is cached after the first turn and essentially free for every subsequent turn.
What This Means for Your Daily Workflow
Understanding the internals changes how you use Claude Code. Here are the practical implications:
Split large tasks: The single-threaded design means Claude can’t parallelize within a session. If you have 3 independent changes, running them as 3 sessions (or using sub-agents) is faster than one mega-prompt.
Front-load context: Since CLAUDE.md loads before everything else and survives compaction, invest in a good CLAUDE.md. Every minute you spend on it saves hours across sessions.
Use /compact proactively: Don’t wait for auto-compaction at 92%. Run /compact focus on [key context] at 60-70% to stay in control of what the model remembers.
Prefer dedicated tools: If Claude is using Bash for file operations, your instructions aren’t clear enough. Add “always use Read/Edit/Grep, never cat/sed/grep via Bash” to your CLAUDE.md.
Understand the cost curve: Early turns are cheap (cached system prompt). Late turns are expensive (full conversation history repriced). For cost-sensitive work, keep sessions focused and compact early.
Use Plan Mode for exploration: Plan Mode restricts to read-only tools. This means Claude can explore freely without the risk of making changes. Use it for “understand this codebase” tasks, then switch to execution mode.
Key Takeaways
- Claude Code is a single-threaded agentic loop: call model, execute tools, feed results back, repeat. No multi-agent swarms. Deliberate simplicity for debuggability.
- ~8,700 tokens of overhead load before you type anything (system prompt + tools + CLAUDE.md + git status). On 200K context, that’s ~4.5%.
- Context compaction fires at ~92% capacity with a 3-tier strategy: clear tool results first, then summarize conversation, then manual
/compact. CLAUDE.md always survives - it’s re-read from disk. - Sub-agents are context-efficient: they run their own loop, potentially consuming 100K+ tokens internally, but return only a 1,000-2,000 token summary to the parent.
- The permission model is fail-safe: deny rules always override allow rules. Hooks let you intercept every tool call in the agentic loop.
Troubleshooting
Context window filling up too fast
Your tool results are too large. Use --limit on Read calls, be specific with Grep patterns, and run /compact focus on [key context] before hitting 80%. Avoid reading entire large files when you only need a section.
Claude using Bash instead of dedicated tools
Your CLAUDE.md needs explicit tool routing instructions. Add: “Always use Read instead of cat, Grep instead of rg, Edit instead of sed. Reserve Bash for commands that require shell execution.”
Session feels slow after many turns
Each turn reprices the entire context. The 50th turn processes 50x more input tokens than the 1st. Run /compact to reset the conversation to a condensed summary, or start a fresh session with claude and reference the previous session’s approach.
Sub-agent returning unhelpful results
The task description you pass to the Agent tool is the sub-agent’s entire context. Be specific: “Find all files that implement the IProductRepository interface and list their public methods” not “Look at the repository layer.”
Hooks not firing
Check settings precedence. Hooks in managed settings override project settings. Run in verbose mode to see hook execution. Verify the hook command exits with code 0 (success) or code 2 (blocking error). Any other exit code is treated as a non-blocking error and silently ignored.
Claude ignoring CLAUDE.md instructions
CLAUDE.md is injected as a user message, not system prompt. Long CLAUDE.md files (200+ lines) get less adherence. Keep instructions concise, specific, and actionable. Vague instructions (“write good code”) are ignored. Specific instructions (“use primary constructors, file-scoped namespaces, and always pass CancellationToken”) are followed.
How does Claude Code's agentic loop work?
Claude Code uses a single-threaded agentic loop: it sends your prompt plus the full conversation history to the Claude API, receives a response that contains either text (done) or tool calls (keep going), executes the tools, feeds the results back as tool_result messages, and calls the model again. This loop repeats until the model produces a text-only response. A typical task runs 5-50 iterations.
How many tokens does a typical Claude Code session use?
A fresh Claude Code session starts with approximately 8,700 tokens of overhead (system prompt, tool definitions, CLAUDE.md, git status). A simple bug fix might use 30,000-50,000 total tokens. A complex refactoring session can consume 150,000-200,000 tokens before triggering context compaction. Average cost is about $6 per developer per day.
What tools does Claude Code have access to?
Claude Code ships with 18+ built-in tools across several categories: Read, Write, Edit for file operations, Glob and Grep for search, Bash for shell execution, WebFetch and WebSearch for web access, Agent and Skill for orchestration, and TodoWrite, AskUserQuestion, SendMessage for planning and communication. The exact count varies by version and configuration. Each tool has a JSON schema description that helps the model decide when to use it.
How does Claude Code manage the context window when it gets full?
Claude Code uses a 3-tier compaction strategy. Tier 1 clears old tool outputs from message history. Tier 2 triggers at approximately 92-95% capacity and summarizes the conversation, preserving architectural decisions and key context while discarding redundant outputs. Tier 3 is manual compaction via the /compact command where you can specify what to focus on. CLAUDE.md always survives compaction because it is re-read from disk.
How does Claude Code decide which tool to use?
The system prompt contains explicit routing instructions that tell Claude to prefer dedicated tools over Bash equivalents. For example, Read instead of cat, Grep instead of rg, Glob instead of find. Claude's model reasoning selects tools based on the user's prompt, previous tool results, tool descriptions, and any instructions in CLAUDE.md. Dedicated tools are preferred because they give users visibility into what Claude is doing.
What happens to CLAUDE.md instructions during context compaction?
CLAUDE.md always survives compaction. After any compaction event, Claude Code re-reads the CLAUDE.md file from disk and re-injects it fresh into the context. This is why CLAUDE.md is the most reliable way to persist instructions across long sessions. Only conversation-specific instructions that exist purely in chat history can be lost during compaction.
How do sub-agents work in Claude Code?
Sub-agents are isolated instances that run in their own context window with their own system prompt. When Claude spawns a sub-agent, it receives a task description and basic environment details but not the full conversation history. The sub-agent runs its own agentic loop and returns a condensed summary (typically 1,000-2,000 tokens) to the parent. Sub-agents cannot spawn other sub-agents, enforcing a maximum depth of 1 level.
How much does a Claude Code session cost?
The average cost is approximately $6 per developer per day, with the 90th percentile under $12 per day. Monthly costs typically range from $100-200 with Sonnet 4.6. Prompt caching significantly reduces costs for long sessions because the system prompt and tool definitions are cached after the first turn. Early turns in a session are cheap, while later turns are more expensive because the entire conversation history is repriced on each turn.
Summary
Claude Code’s architecture is a masterclass in constrained simplicity. A single-threaded agentic loop, 18 well-defined tools, a hierarchical context system, and a fail-safe permission model. No multi-agent swarms, no complex orchestration, no magic. Just a while loop that calls the model, executes tools, and feeds results back until the job is done.
The practical takeaway: the more you understand about this architecture, the better your prompts become, the more efficient your sessions are, and the lower your costs. Front-load context in CLAUDE.md, compact proactively, use dedicated tools, and split independent tasks into separate sessions or sub-agents.
If you found this deep dive useful, share it with your team. And make sure to subscribe to the newsletter where I share exclusive insights from my daily Claude Code usage, including token optimization techniques that don’t make it into the articles.
Happy Coding :)



What's your Feedback?
Do let me know your thoughts around this article.