Anatomy of a Claude Code Session - What Happens Under the Hood

I’ve run hundreds of Claude Code sessions over the past few months. I’ve watched it refactor entire service layers, hit context limits mid-task and recover, spawn sub-agents to explore codebases, and occasionally do something so clever it made me stop and think about what just happened under the hood.

Most developers use Claude Code like a black box. Type a prompt, get code back. But understanding what actually happens between your keystroke and the final code change makes you dramatically better at using it. You learn when to split tasks, when to /compact, when to use Plan Mode, and why some prompts cost $0.50 while others cost $8.

In this article, I’ll trace the complete lifecycle of a Claude Code session, from the moment you type claude in your terminal to the final tool call. I’ve measured real token budgets, traced the agentic loop, and mapped out every system that fires along the way. Let’s get into it.

Claude Code Tutorial for Beginners

New to Claude Code? Start here - this covers installation, first run, and the basics you need before diving into internals.

The 10-Second Version

Claude Code is an agentic harness around Claude. It provides tools, context management, and an execution environment that turns a language model into a coding agent. When you type a prompt, Claude Code assembles a system prompt (~2,900 base tokens), loads your CLAUDE.md instructions, injects 18+ tool definitions, and sends everything to the Claude API. The model responds with either text (done) or tool calls (keep going). This loop - call model, execute tools, feed results back, call model again - repeats until the model produces a text-only response with no tool invocations. A typical task runs 5-50 iterations of this loop.

That’s the core architecture. Everything else - sub-agents, context compaction, hooks, permissions - is infrastructure that makes this loop reliable, safe, and cost-effective. Let’s break it all down.

What Loads Before You Type a Single Character

When you run claude in a project directory, a lot happens before you see the prompt. Here’s the startup sequence, in order:

Managed policy settings load first (organization-level, cannot be overridden)
User settings from ~/.claude/settings.json
Project settings from .claude/settings.json and .claude/settings.local.json
CLAUDE.md files - Claude Code walks UP the directory tree from your current directory, loading every CLAUDE.md it finds. Project-level, user-level, even organization-level if configured.
Rules from .claude/rules/*.md - unconditional ones load immediately, path-scoped ones load on-demand when Claude reads matching files
Auto-memory - MEMORY.md (first 200 lines) from ~/.claude/projects/<project>/memory/
Skills discovery - scans ~/.claude/skills/ and .claude/skills/ for available slash commands
MCP servers connect
Git status injection - current branch, uncommitted changes, recent commit history
System prompt assembly - all of the above gets compiled into the final prompt

Here’s the part that surprised me: the system prompt isn’t a single monolithic string. Claude Code assembles dozens of conditional prompt strings based on your environment, settings, and context. The base system role definition alone is ~2,900 tokens. Add tool definitions (~3,000 tokens for 18+ tools), CLAUDE.md content (500-2,000 tokens typically), git status, and skill descriptions - and you’ve spent 8,000-12,000 tokens before typing a single character.

That’s 4-6% of a 200K context window. On the extended 1M context window available with Opus 4.6, it’s negligible. On the standard 200K, it means you have ~190K tokens for actual conversation and tool results.

A critical detail: CLAUDE.md content is injected as a user message, not as part of the system prompt. This is a deliberate design choice. It means CLAUDE.md instructions are treated as context rather than hard configuration. More specific, concise instructions produce better adherence. Keep your CLAUDE.md under 200 lines.

CLAUDE.md for .NET Developers

I wrote a complete guide to CLAUDE.md with copy-paste templates for Clean Architecture, Minimal APIs, and enterprise solutions. Understanding the loading hierarchy matters for session behavior.

The Agentic Loop - The Heart of Claude Code

This is where the magic happens. The core of Claude Code is a deceptively simple while loop (documented in Anthropic’s official architecture overview):

while (response contains tool_calls):
    execute tool(s)
    feed results back as tool_result messages
    call model again with updated conversation

when response is plain text (no tool calls):
    loop terminates, return to user

That’s it. No complex multi-agent orchestration, no swarm intelligence, no parallel reasoning threads. A single-threaded master loop that calls the model, executes tools, and feeds results back until the model decides it’s done.

The Three Phases

Every iteration of the loop falls into one of three phases, though Claude blends them fluidly:

Gather context - Read files, search code, explore the codebase
Take action - Edit files, run commands, create new files
Verify results - Run tests, check builds, read the output

Claude decides what each step requires based on what it learned from the previous step. There’s no rigid state machine forcing it through phases. The model’s reasoning drives the flow.

The .NET Middleware Pipeline Analogy

If you’re a .NET developer, think of it like the ASP.NET Core middleware pipeline - but in reverse. In ASP.NET Core, a request flows through middleware layers, each one potentially short-circuiting or modifying the request/response. In Claude Code, each loop iteration is like a request through the pipeline:

PreToolUse hooks fire (like request middleware - can allow, deny, or modify)
Permission check runs (like authorization middleware)
Tool executes (like your endpoint handler)
PostToolUse hooks fire (like response middleware - can provide feedback)
System reminders inject (like logging middleware adding context)
Model receives everything and decides the next action

The key difference: in ASP.NET Core, the pipeline processes one request. In Claude Code, the pipeline loops until the model produces a text-only response. Each iteration builds on everything that came before.

Middlewares in ASP.NET Core

If the pipeline analogy resonated, this deep dive covers execution order, custom middleware, IMiddleware, and short-circuiting in ASP.NET Core.

What Stops the Loop?

The loop terminates when:

The model produces plain text without any tool invocations (most common)
You interrupt by pressing Escape
Context window exhaustion triggers auto-compaction
maxTurns limit on sub-agents prevents runaway loops
A Stop hook with decision: "block" prevents termination (yes, hooks can force Claude to keep going)

The Async Queue

There’s one more piece: a dual-buffer async queue that enables pause/resume and mid-task user interjections. This is what lets you press Escape while Claude is mid-edit, type a correction, and have Claude adjust without restarting the entire task. It’s a small but critical detail that makes Claude Code feel interactive rather than batch-processed.

The 18+ Built-In Tools - How Claude Picks the Right One

Claude Code ships with 18+ built-in tools, each with a JSON schema description embedded in the system prompt. The model selects tools based on the user’s intent, previous tool results, and explicit instructions in the tool descriptions.

Tool Categories

File Operations:

Read - Read files (default ~2,000 lines, supports images, PDFs, Jupyter notebooks)
Write - Create or overwrite entire files
Edit - Surgical string replacement with uniqueness check (preferred over Write for modifications)

Search:

Glob - Fast file pattern matching, sorted by modification time
Grep - Ripgrep-powered regex search with multiple output modes

Execution:

Bash - Persistent shell with risk classification, injection filtering, timeout support

Web:

WebFetch - URL retrieval with AI summarization and 15-minute cache
WebSearch - Web search with domain filtering

Orchestration:

Agent - Spawn sub-agents with isolated context
Skill - Invoke skills (custom slash commands) by name
NotebookEdit - Jupyter notebook cell editing

Planning and Communication:

TodoWrite - Structured task lists with priorities
AskUserQuestion - Ask the user for input
SendMessage - Inter-agent communication

How Claude Decides Which Tool to Use

This isn’t random. The system prompt contains explicit routing instructions:

“Use Read instead of cat, head, tail, or sed”
“Use Grep instead of grep or rg via Bash”
“Use Glob instead of find or ls”
“For broader codebase exploration, use Agent with subagent_type=Explore”
“For simple, directed searches, use Glob or Grep directly”

There’s a hierarchy: dedicated tools are always preferred over Bash equivalents. This is because dedicated tools give the user visibility into what Claude is doing - a Grep call is transparent, while rg pattern inside Bash is opaque.

My take: Understanding this hierarchy is one of the biggest productivity wins. When Claude uses the wrong tool (like Bash for file reading), the system prompt is literally telling it not to. If your CLAUDE.md reinforces the right tool choices for your project (“always use dotnet test for testing, never dotnet run on test projects”), Claude follows those instructions within the agentic loop.

Prompt Engineering for Claude Code

Writing effective prompts is the other half of the equation. This guide covers 10 bad-vs-better patterns and the 4-layer prompt hierarchy.

Sub-Agents - When Claude Calls for Backup

Sometimes a single thread isn’t enough. Claude Code can spawn sub-agents: isolated instances that run in their own context window with their own system prompt.

Built-In Sub-Agent Types

Agent Type	Model	Tools	Purpose
Explore	Haiku (fast)	Read-only	File discovery, codebase exploration
Plan	Inherits	Read-only	Research for plan mode decisions
general-purpose	Inherits	All tools	Complex multi-step tasks

How Context Forking Works

When Claude spawns a sub-agent, it does NOT copy the entire conversation history. The sub-agent receives:

Its own system prompt (much smaller than the main one)
Basic environment details (working directory, platform)
The task description from the parent
CLAUDE.md content (still loaded)

The sub-agent runs its own agentic loop, potentially making dozens of tool calls. When it finishes, only a condensed summary returns to the parent - typically 1,000-2,000 tokens from what might have been tens of thousands of tokens of exploration.

This is the key insight: sub-agents are context-efficient. They let Claude explore deeply without filling the parent’s context window. An Explore sub-agent might read 50 files (consuming 100K+ tokens internally) but return a 1,500-token summary to the parent.

The Critical Constraint

Sub-agents cannot spawn other sub-agents. There’s a hard depth limit of 1 level. This prevents recursive explosion and keeps the system debuggable. Anthropic explicitly chose this over multi-agent swarms: “a simple, single-threaded master loop combined with disciplined tools delivers controllable autonomy.”

When to Use Sub-Agents

Here’s my decision matrix from real usage:

Scenario	Use Sub-Agent?	Which Type?
Find a specific file/function	No - use Glob/Grep directly	-
Understand a module’s architecture	Yes	Explore
Run tests independently	Yes	general-purpose
Research before a complex refactor	Yes	Explore or Plan
Parallel independent tasks	Yes (background)	general-purpose
Simple file edits	No - do it inline	-

Claude Code Agent Teams

For parallel AI agents working on separate tasks simultaneously, Agent Teams take sub-agents to the next level. I cover Windows setup, token costs, and when teams vs single agents makes sense.

Context Window - The 200K Token Budget

Every turn in a Claude Code session reprices the entire context. The model processes ALL accumulated input tokens on each interaction. This means your 50th prompt is dramatically more expensive than your 1st, because the model reads the entire conversation history every time.

Real Token Budget Breakdown

Here’s what I’ve measured from my own sessions on a standard 200K context:

Component	Tokens	% of 200K
Base system prompt	~2,900	1.5%
18+ tool definitions	~3,000	1.5%
CLAUDE.md (typical project)	~1,200	0.6%
Rules files	~500	0.3%
Git status	~300	0.2%
Skill descriptions	~800	0.4%
Total overhead	~8,700	~4.5%
Available for conversation	~191,000	~95.5%

That looks comfortable (see Claude Code costs documentation for official pricing and token details). But here’s where it gets real: a single file read can return 2,000+ lines (easily 8,000-15,000 tokens). A Bash command output can be huge. After 10-15 tool calls, you’ve consumed 50-80K tokens. By mid-session, the overhead isn’t the system prompt - it’s the accumulated tool results.

The 3-Tier Compaction Strategy

When you approach ~92-95% context capacity, Claude Code triggers automatic compaction (Anthropic’s engineering blog has a deep dive on context engineering that explains the philosophy). It uses three tiers:

Tier 1 - Tool result clearing: Removes raw tool outputs from deep in message history. Lightest touch, safest option.

Tier 2 - Conversation summarization: The compactor passes the full history to the model for summarization. It preserves architectural decisions, unresolved bugs, implementation details, and key decisions. It discards redundant tool outputs, verbose intermediate results, and detailed instructions from early in the conversation. Typical compression: 60-80%.

Tier 3 - Manual compaction: /compact [focus] lets you direct what to preserve. You can say /compact focus on the auth refactor and the compactor will prioritize auth-related context.

What Survives Compaction

This is the part nobody explains clearly. Here’s what I’ve observed:

Always survives:

CLAUDE.md (re-read from disk after compaction, guaranteed fresh)
Current file states (Claude re-reads if needed)
The most recent tool results
Active TODO list items

Usually survives (summarized):

Architectural decisions from earlier in the session
Key code patterns and approaches chosen
Error messages and their resolutions

Gets discarded:

Early exploratory file reads (the raw content, not the insight)
Verbose command outputs from successful operations
Superseded approaches that were abandoned

My take: Knowing this changes how I work. For long refactoring sessions, I run /compact focus on [the key changes] proactively at around 60-70% capacity rather than waiting for auto-compaction at 92%. This gives me control over what survives instead of leaving it to the summarizer.

Plan Mode in Claude Code

Plan Mode restricts Claude to read-only tools, forcing it to explore and plan before acting. Understanding the agentic loop explains why this constraint is so effective.

.NET Claude Kit

Free

Open-source Claude Code companion with 47 skills and 10 specialist agents

The Permission and Safety Model

Claude Code doesn’t just execute whatever the model asks. Every tool call goes through a permission pipeline.

Rule Evaluation: deny > ask > allow

Permission rules are evaluated in strict order:

Deny rules - checked first. If any deny rule matches, the tool call is blocked. Period.
Ask rules - checked second. If matched, the user is prompted for approval.
Allow rules - checked last. If matched, the tool call proceeds silently.

First match wins. This means a deny rule always overrides an allow rule, making the system fail-safe.

Permission Modes

Mode	Behavior
`default`	Prompts for permission on first use of each tool
`acceptEdits`	Auto-accepts file edits, still asks for commands
`plan`	Read-only tools only (this is Plan Mode)
`bypassPermissions`	Skips prompts (except writes to `.git`, `.claude`, `.vscode`)

You cycle through modes with Shift+Tab during a session.

Hooks - Intercepting the Loop

Hooks are shell commands or HTTP endpoints that fire at lifecycle events (see the full hooks reference for all 12+ event types). They intercept the agentic loop at specific points:

PreToolUse - fires BEFORE a tool executes. Can allow, deny, or modify the tool’s input.
PostToolUse - fires AFTER successful execution. Can provide corrective feedback.
Stop - fires when Claude tries to finish. Can force Claude to keep going (quality gates).
SessionStart/SessionEnd - lifecycle bookends.

The most powerful pattern I’ve found: a PreToolUse hook that runs dotnet format before every Edit tool call, ensuring code style is consistent. And a Stop hook that runs dotnet test and blocks stopping if tests fail.

Hooks in Claude Code

I documented 10+ hook configurations for dotnet format, test automation, file protection, and command blocking. Hooks are how you make the agentic loop work YOUR way.

Extended Thinking and Prompt Caching

Two features that significantly affect session cost and quality:

Extended Thinking

Enabled by default. Claude Code allocates thinking tokens based on task complexity (adaptive reasoning). Thinking tokens are billed as output tokens but dramatically improve planning quality.

Control it with effort levels:

low - faster, cheaper, straightforward tasks
medium - default for Opus 4.6
high - deeper reasoning for complex tasks
max - no token spending constraint (Opus 4.6 only)

Prompt Caching

Claude Code automatically caches system prompts and tool definitions across requests within a session. The first request pays full input token cost. Subsequent requests with the same prefix get significantly cheaper cache-read pricing.

This is why long sessions with consistent system prompts are more cost-efficient per-turn than many short sessions. The system prompt (~8,700 tokens) is cached after the first turn and essentially free for every subsequent turn.

What This Means for Your Daily Workflow

Understanding the internals changes how you use Claude Code. Here are the practical implications:

Split large tasks: The single-threaded design means Claude can’t parallelize within a session. If you have 3 independent changes, running them as 3 sessions (or using sub-agents) is faster than one mega-prompt.

Front-load context: Since CLAUDE.md loads before everything else and survives compaction, invest in a good CLAUDE.md. Every minute you spend on it saves hours across sessions.

Use /compact proactively: Don’t wait for auto-compaction at 92%. Run /compact focus on [key context] at 60-70% to stay in control of what the model remembers.

Prefer dedicated tools: If Claude is using Bash for file operations, your instructions aren’t clear enough. Add “always use Read/Edit/Grep, never cat/sed/grep via Bash” to your CLAUDE.md.

Understand the cost curve: Early turns are cheap (cached system prompt). Late turns are expensive (full conversation history repriced). For cost-sensitive work, keep sessions focused and compact early.

Use Plan Mode for exploration: Plan Mode restricts to read-only tools. This means Claude can explore freely without the risk of making changes. Use it for “understand this codebase” tasks, then switch to execution mode.

Skills in Claude Code

Skills turn repetitive workflows into reusable slash commands. Understanding how the Skill tool injects instructions into the agentic loop explains why well-designed skills are so powerful.

Git Worktrees in Claude Code

Git worktrees give each Claude session an isolated branch. This pairs perfectly with the session architecture - each worktree gets its own context, its own changes, and its own cleanup.

Key Takeaways

Claude Code is a single-threaded agentic loop: call model, execute tools, feed results back, repeat. No multi-agent swarms. Deliberate simplicity for debuggability.
~8,700 tokens of overhead load before you type anything (system prompt + tools + CLAUDE.md + git status). On 200K context, that’s ~4.5%.
Context compaction fires at ~92% capacity with a 3-tier strategy: clear tool results first, then summarize conversation, then manual /compact. CLAUDE.md always survives - it’s re-read from disk.
Sub-agents are context-efficient: they run their own loop, potentially consuming 100K+ tokens internally, but return only a 1,000-2,000 token summary to the parent.
The permission model is fail-safe: deny rules always override allow rules. Hooks let you intercept every tool call in the agentic loop.

Troubleshooting

Context window filling up too fast

Your tool results are too large. Use --limit on Read calls, be specific with Grep patterns, and run /compact focus on [key context] before hitting 80%. Avoid reading entire large files when you only need a section.

Claude using Bash instead of dedicated tools

Your CLAUDE.md needs explicit tool routing instructions. Add: “Always use Read instead of cat, Grep instead of rg, Edit instead of sed. Reserve Bash for commands that require shell execution.”

Session feels slow after many turns

Each turn reprices the entire context. The 50th turn processes 50x more input tokens than the 1st. Run /compact to reset the conversation to a condensed summary, or start a fresh session with claude and reference the previous session’s approach.

Sub-agent returning unhelpful results

The task description you pass to the Agent tool is the sub-agent’s entire context. Be specific: “Find all files that implement the IProductRepository interface and list their public methods” not “Look at the repository layer.”

Hooks not firing

Check settings precedence. Hooks in managed settings override project settings. Run in verbose mode to see hook execution. Verify the hook command exits with code 0 (success) or code 2 (blocking error). Any other exit code is treated as a non-blocking error and silently ignored.

Claude ignoring CLAUDE.md instructions

CLAUDE.md is injected as a user message, not system prompt. Long CLAUDE.md files (200+ lines) get less adherence. Keep instructions concise, specific, and actionable. Vague instructions (“write good code”) are ignored. Specific instructions (“use primary constructors, file-scoped namespaces, and always pass CancellationToken”) are followed.

How does Claude Code's agentic loop work?

Claude Code uses a single-threaded agentic loop: it sends your prompt plus the full conversation history to the Claude API, receives a response that contains either text (done) or tool calls (keep going), executes the tools, feeds the results back as tool_result messages, and calls the model again. This loop repeats until the model produces a text-only response. A typical task runs 5-50 iterations.

How many tokens does a typical Claude Code session use?

A fresh Claude Code session starts with approximately 8,700 tokens of overhead (system prompt, tool definitions, CLAUDE.md, git status). A simple bug fix might use 30,000-50,000 total tokens. A complex refactoring session can consume 150,000-200,000 tokens before triggering context compaction. Average cost is about $6 per developer per day.

What tools does Claude Code have access to?

Claude Code ships with 18+ built-in tools across several categories: Read, Write, Edit for file operations, Glob and Grep for search, Bash for shell execution, WebFetch and WebSearch for web access, Agent and Skill for orchestration, and TodoWrite, AskUserQuestion, SendMessage for planning and communication. The exact count varies by version and configuration. Each tool has a JSON schema description that helps the model decide when to use it.

How does Claude Code manage the context window when it gets full?

Claude Code uses a 3-tier compaction strategy. Tier 1 clears old tool outputs from message history. Tier 2 triggers at approximately 92-95% capacity and summarizes the conversation, preserving architectural decisions and key context while discarding redundant outputs. Tier 3 is manual compaction via the /compact command where you can specify what to focus on. CLAUDE.md always survives compaction because it is re-read from disk.

How does Claude Code decide which tool to use?

The system prompt contains explicit routing instructions that tell Claude to prefer dedicated tools over Bash equivalents. For example, Read instead of cat, Grep instead of rg, Glob instead of find. Claude's model reasoning selects tools based on the user's prompt, previous tool results, tool descriptions, and any instructions in CLAUDE.md. Dedicated tools are preferred because they give users visibility into what Claude is doing.

What happens to CLAUDE.md instructions during context compaction?

CLAUDE.md always survives compaction. After any compaction event, Claude Code re-reads the CLAUDE.md file from disk and re-injects it fresh into the context. This is why CLAUDE.md is the most reliable way to persist instructions across long sessions. Only conversation-specific instructions that exist purely in chat history can be lost during compaction.

How do sub-agents work in Claude Code?

Sub-agents are isolated instances that run in their own context window with their own system prompt. When Claude spawns a sub-agent, it receives a task description and basic environment details but not the full conversation history. The sub-agent runs its own agentic loop and returns a condensed summary (typically 1,000-2,000 tokens) to the parent. Sub-agents cannot spawn other sub-agents, enforcing a maximum depth of 1 level.

How much does a Claude Code session cost?

The average cost is approximately $6 per developer per day, with the 90th percentile under $12 per day. Monthly costs typically range from $100-200 with Sonnet 4.6. Prompt caching significantly reduces costs for long sessions because the system prompt and tool definitions are cached after the first turn. Early turns in a session are cheap, while later turns are more expensive because the entire conversation history is repriced on each turn.

Summary

Claude Code’s architecture is a masterclass in constrained simplicity. A single-threaded agentic loop, 18 well-defined tools, a hierarchical context system, and a fail-safe permission model. No multi-agent swarms, no complex orchestration, no magic. Just a while loop that calls the model, executes tools, and feeds results back until the job is done.

The practical takeaway: the more you understand about this architecture, the better your prompts become, the more efficient your sessions are, and the lower your costs. Front-load context in CLAUDE.md, compact proactively, use dedicated tools, and split independent tasks into separate sessions or sub-agents.

If you found this deep dive useful, share it with your team. And make sure to subscribe to the newsletter where I share exclusive insights from my daily Claude Code usage, including token optimization techniques that don’t make it into the articles.

Happy Coding :)