Skip to main content

Session Usage Optimization

Context breakdown

CategoryWhat it is
System promptClaude Code's internal instructions + your CLAUDE.md
System toolsTool definitions (bash, file read/write, etc.) loaded every turn
Memory filesAuto-memory summaries written to ~/.claude/memory
SkillsAny custom skill definitions
MessagesAccumulated conversation + tool outputs this session
Autocompact bufferReserved headroom before autocompact triggers
Free spaceUsable remaining context

The dominant cost is generally Messages — that's the conversation history snowballing as the session progresses, which is normal and expected.


On the 200k limit

The 200k is the context window per session (how much Claude can hold in working memory at once), not your 5-hour session usage quota. These are two separate limits:

  • Context window (200k): per-session, resets when a new session starts or autocompact triggers. This is what /context shows.
  • 5-hour usage quota: cumulative token consumption across all Claude surfaces (claude.ai, Claude Code, Claude Desktop) within a rolling 5-hour window. Not shown here.

So 200k is not "global across sessions" — each session gets its own 200k window. But all sessions share the same 5-hour quota bucket.


Which settings apply given 200k context

The remaining three are still valid:

  • CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING — if Sonnet is using any extended thinking, disabling it saves tokens. Sonnet uses it less aggressively than Opus but it can still trigger.
  • CLAUDE_CODE_DISABLE_AUTO_MEMORY — your memory files grow over time. Disabling prevents background summarization calls eating into your 5-hour quota silently.
  • CLAUDE_CODE_SUBAGENT_MODEL: "sonnet" — you're already on Sonnet as the main model. This only matters if subagents would otherwise default to Opus.

Additional levers:

  • Trim CLAUDE.md — your system prompt is 6.4k, paid every single turn. Every 1k you cut saves tokens linearly across all turns in a session.
  • /compact — manually trigger compaction before the autocompact buffer forces it. This lets you control when history gets summarized and can be more efficient than waiting for the automatic trigger.
  • Avoid long tool output chains — bash commands that dump large stdout go into Messages history. Pipe through head/grep to truncate noisy output before it lands in context.

Example Key Numbers at a Glance

  • Model: Claude Sonnet 4.6 (specifically claude-sonnet-4-6)
  • Current usage: 76.8k / 200k tokens (38%)
    • This means you're using a 200,000-token context window (standard for Sonnet 4.6 in Claude Code at the time of this screenshot). The 1M-token extended context is not active here (it requires specific flags, higher plans, or API usage and often incurs extra costs/rate limits).

You're at a comfortable 38% fill rate — plenty of room left before quality starts degrading or auto-compaction kicks in more aggressively.

Breakdown of "Estimated usage by category"

Here's what each line actually means (based on how Claude Code structures its context):

  • System prompt (6.4k tokens, 3.2%)
    Core instructions that tell Claude how to behave: its personality, coding style, safety rules, response format, etc. This is always present and relatively fixed.

  • System tools (8.7k tokens, 4.4%)
    Descriptions and definitions of all available tools Claude Code can use (e.g., file reading, bash execution, search tools, MCP tools if enabled). These are "deferred" in some cases — only the lightweight descriptions load initially.

  • Memory files (824 tokens, 0.4%)
    Persistent project knowledge files, typically your CLAUDE.md (project-level rules, architecture notes, preferences) and sometimes global/user-level memory files or auto-memory summaries.
    At only 824 tokens, your CLAUDE.md is quite lean — that's good for keeping overhead low.

  • Skills (476 tokens, 0.2%)
    Short descriptions (usually just YAML frontmatter) of any "Skills" you've installed or defined. Skills are reusable mini-instructions/scripts for specialized tasks (e.g., "refactor React component", "write tests"). The full skill content only loads when actually invoked, so this stays very lightweight.

  • Messages (44.5k tokens, 22.2%)
    The actual conversation history — your prompts + Claude's responses so far in this session.
    This is the largest variable part and grows with every turn. At 44.5k it's moderate — not yet bloated.

  • Free space (106.1k tokens, 53.1%)
    Remaining tokens available for new messages, tool outputs, file reads, etc. This is the usable "working room."

  • Autocompact buffer (33k tokens, 16.5%)
    A reserved zone that Claude Code keeps free. When conversation history (Messages) starts eating into this buffer, the system automatically compacts older messages by summarizing them. This prevents sudden hard cutoffs and helps maintain continuity in long sessions.
    In 2026, this buffer is typically ~33k tokens (down from ~45k in earlier versions), which gives you a bit more effective working space than before.

Visual Icons (the colored blocks)

The purple/orange patterned bars are a quick visual summary:

  • Solid blocks = currently loaded/used
  • Empty or crossed blocks = free or reserved (the crossed ones often represent the autocompact buffer or deferred content)

Why This Matters for Usage Limits

Even though you're only at 38%, context bloat is one of the main reasons sessions feel like they "eat quota aggressively." Every new message re-processes a large chunk of the current context (especially with poor caching).

Healthy signs in your screenshot:

  • Memory files and Skills are very small → good hygiene.
  • Messages are only ~22% → session is still young/focused.
  • Plenty of free space.

Potential optimizations if you notice faster burn later:

  • Keep sessions focused and use handoff summaries before /clear or starting a new chat.
  • Avoid dragging in unnecessary files early.
  • If you have a large CLAUDE.md, trim it.
  • Consider the config flags you showed earlier (disabling 1M context if active, adaptive thinking, etc.) — they directly help control hidden overhead.

Running the /context command periodically helps you spot bloat before it becomes a problem (e.g., when Messages creep toward 100k+)

Claude Code File Reads Optimization Suggestions:

  1. Reference earlier reads explicitly — in your prompt say "use the file content from earlier" instead of triggering another read tool call. Claude will pull from conversation history instead of re-reading disk.
  2. Use offset/limit params on file reads for large files when you only need a section.
  3. Add frequently-read files to project knowledge — files in the Project knowledge base are cached and don't re-cost on every reference.