Pker.xyz

Reverse Engineering Claude Code

reverse engineering, linux, tools

For educational purposes only. Claude Code version 2.1.38, on x86-64 Linux (Ubuntu).


Introduction

What does Claude do on our machines? What does it do at startup? How does the agentic loop work? What does the client actually do? What can we find with standard tooling? Can we use the excellent compendium to help us?

If you've ever been curious about poking at some binary on Linux, this is a decent blueprint.

Tools used: file, ldd, readelf, nm, strings, xxd, grep, dd, and compendium (my syscall tracer).


Static Analysis: What Is This Binary?

We have an unkown bin, we want to know it's (potentially dirty) secrets. Lets try to peel each layer.

Step 1: Find and identify it

which resolves a command name to a path. file reads magic bytes and ELF headers to classify it. ls -lh gives you the size.

source
$ which claude
/home/louis/.local/bin/claude

$ file $(which claude)
symbolic link to /home/louis/.local/share/claude/versions/2.1.38

$ file /home/louis/.local/share/claude/versions/2.1.38
ELF 64-bit LSB executable, x86-64, dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, not stripped

$ ls -lh /home/louis/.local/share/claude/versions/2.1.38
-rwxr-xr-x 223M louis 10 Feb 08:35 2.1.38

Right away, a few things stand out. It's a symlink into a versioned directory, which suggests an auto-update mechanism that swaps symlinks. It's a native ELF 64-bit executable, not a script or some sort of JAR. It's not stripped, meaning symbol names are still present (this will help us later). And it's 223 MB. For reference, the entire coreutils package (ls, cp, mv, cat, etc.) is about 15 MB. Something pretty dang large is embedded inside.

Step 2: Check shared library dependencies

ldd prints what shared libraries a dynamically-linked ELF depends on. This tells you what the binary doesn't bundle.

source
$ ldd /home/louis/.local/share/claude/versions/2.1.38
    libc.so.6          => /lib/x86_64-linux-gnu/libc.so.6
    libpthread.so.0    => /lib/x86_64-linux-gnu/libpthread.so.0
    libdl.so.2         => /lib/x86_64-linux-gnu/libdl.so.2
    libm.so.6          => /lib/x86_64-linux-gnu/libm.so.6
    /lib64/ld-linux-x86-64.so.2

Only glibc basics. No libssl, no libcurl, no libnode, no libstdc++. The TLS stack, HTTP client, C++ runtime, and JavaScript engine (spoiler) are all statically linked. This is what makes it 223 MB and fully self-contained.

The presence of libdl is interesting: it means the binary can dlopen() shared objects at runtime. This is how N-API native modules work in a single-executable build.

Step 3: Read the ELF sections

readelf -SW lists section headers: named regions of the binary with specific purposes. The -S flag asks for sections, W avoids column truncation.

source
$ readelf -SW /home/louis/.local/share/claude/versions/2.1.38
  [13] .rodata                  39.77 MB   Read-only data
  [14] .text                    56.79 MB   Executable machine code
  [25] __DATA,__jsc_opcodes      0.02 MB   JavaScriptCore opcode tables
  [26] __DATA,__wtf_config       0.02 MB   WebKit config

The section names __DATA,__jsc_opcodes and __DATA,__wtf_config use Mach-O naming conventions (the __DATA,__name format comes from macOS). Finding these in a Linux ELF is a fingerprint: they come from WebKit/JSC source code that uses __attribute__((section())) with macOS-style names.

Now for the key observation: all ELF sections total about 97 MB. But the file is 223 MB. That means roughly 115 MB of data is appended after the ELF structure. Hmmm... we'll come back to this.

Step 4: Identify the runtime with dynamic symbols

nm -D lists dynamic symbols: functions the binary exports or imports. For a language runtime, the exports identify exactly what's inside.

source
$ nm -D /home/louis/.local/share/claude/versions/2.1.38 | grep "@@BUN" | head -3
00000000032bcc20 T napi_acquire_threadsafe_function@@BUN_1.2
000000000431d540 T napi_add_async_cleanup_hook@@BUN_1.2
000000000431cf20 T napi_add_env_cleanup_hook@@BUN_1.2

$ nm -D /home/louis/.local/share/claude/versions/2.1.38 | grep -c "@@BUN"
556

556 symbols versioned as @@BUN_1.2. These are N-API functions provided by Bun's runtime, not Node's. That version tag is the proof: this binary is a Bun application. Whatever this means.

Step 5: Extract version strings

strings pulls printable sequences from a binary. It's a blunt tool, but for identifying what a binary is and what it talks to, it's quick and easy to use.

source
$ strings /home/louis/.local/share/claude/versions/2.1.38 | grep -i "bun v"
Bun v1.3.9-canary.51+d5628db23 (Linux x64 baseline)

Bun v1.3.9-canary.51, a pre-release build. The "baseline" tag means it's compiled for generic x86-64 without AVX2 or other specific cpu flags, probably for maximum compatibility.

Step 6: Identify the compiler toolchain

The .comment ELF section is populated by compilers with their version strings.

source
$ readelf -p .comment /home/louis/.local/share/claude/versions/2.1.38
  Linker: Ubuntu LLD 21.1.5
  GCC: (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0
  rustc version 1.94.0-nightly (c61a3a44d 2025-12-09)
  Ubuntu clang version 21.1.5

Four compilers contributed to this binary: Clang for the JSC C++ code, GCC for vendored C libraries, Rust nightly for native components (Bun uses Rust for parts of its I/O stack as I found out online, although this is somewhat speculative), and LLD as the linker.

What we know before running it

Without executing a single instruction, we've established: it's a Bun v1.3.9-canary.51 single-executable application using JavaScriptCore. The 97 MB ELF contains the native runtime, and 115 MB of data is appended after the ELF sections. It depends only on glibc and is fully self-contained.


Dynamic Analysis: Watching It Run

Now we run it and watch what happens. I used compendium with --verbose (raw syscalls) and -o (save to file). Two traces:

source
$ compendium --verbose -o traces/claude_help.log -- claude --help
$ compendium --verbose -o traces/claude_print.log -- claude -p "say just the word hi"

The --help trace is interesting because it's a minimal invocation with no network. Here's the summary:

source
 FINAL: claude --help (0.25s)
 Memory:
   Heap:  132.0 KB
   Mmap:    1.2 GB (67 regions)
   Total:   1.2 GB
 I/O:
   Files read:    115.2 MB
   Files written:      0 B
   Net sent:           0 B
   Net received:       0 B

Just to print help text, it allocates 1.2 GB of virtual memory and reads 115.2 MB from disk. That 115 MB matches exactly the appended payload we found in static analysis. Yay! We just caught the runtime reading its own embedded JavaScript, this confirms what we discovered earlier.

Note that I don't show our tracer's full output since its super verbose (all syscalls are intercepted..). A real prompt is heavier:

source
 FINAL: claude -p say just the word hi, nothing else (3.80s)
 Memory:
   Mmap:    3.3 GB (760 regions)
 I/O:
   Files read:    373.5 MB
   Files written:   9.6 MB
   Net sent:      202.0 KB
   Net received:   42.7 KB
 Files:
   Opened for read:  202
   Opened for write: 13
 Network connections:
   tcp4 → 160.79.104.10:443     (Anthropic API)
   tcp4 → 140.82.113.3:22       (GitHub SSH)
   tcp4 → 34.149.66.137:443     (Telemetry/Statsig)
 Subprocesses: ["uname", "rust-analyzer", "grep", "2.1.38",
                "ps", "ssh", "git", "sh", "which"]

202 file opens, 9 subprocess types, 3 network connections. Just to answer "hi." Let me break down the startup timeline from the verbose log.

The startup timeline

0 to 1ms: Dynamic linker. Loads the five shared libraries.

2 to 47ms: Bun/JSC initialization. This is where the heavy lifting happens. The runtime checks the kernel's overcommit policy, allocates a 1 GB JSC heap region, a 128 GB virtual reservation for the JIT compiler, and most importantly:

source
[+0.047s] read 115.1 MB from exe    ← the entire JS bundle, one read

That single read takes about 43ms and is the dominant startup cost. The runtime reads its own executable to extract the embedded JavaScript. This is the Bun SEA (Single Executable Application) mechanism at work.

47 to 55ms: Thread pool. Bun spins up 8 worker threads immediately after JSC init, then 3 more later for JIT compilation and GC.

55 to 240ms: GC pressure monitoring. A striking pattern: the main thread polls /proc/self/statm roughly 100 times while JSC parses and compiles the 115 MB bundle. This is the garbage collector deciding whether to trigger collection during the heavy initialization phase.

245 to 290ms: Configuration loading. Here's where "Claude Code the application" shows up rather than "Bun the runtime":

source
[+0.245s] open ~/.claude/settings.json (read)
[+0.249s] open .claude/settings.local.json
[+0.251s] open ~/.claude/.credentials.json
[+0.260s] open ~/.claude.json (read)
[+0.268s] open ~/.claude/plugins/known_marketplaces.json
[+0.291s] open CLAUDE.md (read)

Notice that settings.json gets opened 5 times in the same millisecond. Multiple modules requesting the same config without a shared mapping/cache.

290ms+: Network and subprocesses. Three connections go out in parallel: the Anthropic API (for the actual prompt), GitHub via SSH (a git fetch to check for remote changes), and a telemetry endpoint on Google Cloud (Statsig). The binary also spawns itself twice to compile native addon caches, launches rust-analyzer because we're in a Rust project when running this (I guess(?)), and runs which three times to probe for available tools.

The network numbers are interesting for their asymmetry: we sent 5x more than we received (202 KB vs 42.7 KB) for a trivial "say hi" prompt. The request payload (~80 KB) dwarfs the response because it includes the full system prompt, CLAUDE.md instructions, git state, and tool schemas. For one word of output, 80 KB of context goes in.


Extracting the JavaScript Bundle

We know 115 MB is appended after the ELF sections. We watched the runtime read all of it at startup. Now we extract it.

Finding the boundary

grep -boa does a byte-offset, binary-safe search. We look for the Bun module header signature:

source
$ grep -boa '// @bun @bytecode @bun-cjs' claude
78242:// @bun @bytecode @bun-cjs          ← in .rodata (Bun's own modules)
101962616:// @bun @bytecode @bun-cjs      ← START OF SEA PAYLOAD
197232544:// @bun @bytecode @bun-cjs      ← second copy (different bytecode tier)
207706904:// @bun @bytecode @bun-cjs      ← small shim modules

The payload starts at offset 101,962,616. We can peek at it with xxd:

source
$ xxd -s 0x613d330 -l 80 claude
  file:///$bunfs/root/src/entrypoints/cli.js
  // @bun @bytecode @bun-cjs
  (function(exports, require, module...

The $bunfs virtual filesystem path tells us this is the main CLI entrypoint. The bundled code thinks it's reading from a filesystem, but it's actually loaded from the appended blob.

How much is readable source?

The @bytecode annotation means each module contains both JavaScript source text and pre-compiled JSC bytecode. We need to figure out where the source ends and the bytecode begins.

The approach: scan the payload in 1 MB chunks and check what percentage of each chunk is printable ASCII. Source code is 100% printable. Bytecode is not. The transition clear (kind of):

source
$ dd if=claude bs=1 skip=101962616 count=$((222697790 - 101962616)) | \
    python3 -c "scan 1MB chunks, report printable %"
Offset  0MB: 100.0% printable
Offset  5MB: 100.0% printable
Offset 10MB:  17.5% printable     ← transition to bytecode
First non-printable byte at offset 10,474,337 (10.0 MB into payload)

10 MB of readable source, then the remaining 80+ MB drops to 17.5% printable (that's JSC bytecode with interspersed string constants for stack traces).

We can cross-verify this with Module 2. Remember the grep -boa output showed a second copy at offset 197,232,544. Module 3 starts at 207,706,904, making Module 2 exactly 10,474,360 bytes. Our extraction from Module 1 is 10,474,337 bytes. The 23-byte difference some metadata header before the source text. The sizes match, confirming we have the source! Module 2 is an independent copy at a different bytecode compilation tier, for optimization purposes. Looking online people claim that Bun is very fast (for a JS "runtime"), well that's partly how they do it.

Extracting and beautifying

source
$ dd if=claude bs=1 skip=101962616 count=10474337 > claude_bundle.js
$ wc -l claude_bundle.js
7,539 lines

$ pip3 install jsbeautifier
$ js-beautify -f claude_bundle.js -o claude_bundle.beautified.js
$ wc -l claude_bundle.beautified.js
438,162 lines

7,539 minified lines become 438,162 lines of properly indented, readable JavaScript. Variable names are still mangled (Au1, Hz, Px) but the control flow, string constants, and structure are all navigable.

The file opens with a copyright notice and the exact build metadata:

js
// @bun @bytecode @bun-cjs
(function(exports, require, module, __filename, __dirname) {
// Claude Code is a Beta product per Anthropic's Commercial Terms of Service.
// By using Claude Code, you agree that all code acceptance or rejection
// decisions you make, and the associated conversations in context,
// constitute Feedback under Anthropic's Commercial Terms, and may be used
// to improve Anthropic's products, including training models.
// You are responsible for reviewing any code suggestions before use.
// (c) Anthropic PBC. All rights reserved.
// Version: 2.1.38
var AFB=Object.create;var{getPrototypeOf:LFB,defineProperty:vHH,...

It's a js bundle: one giant (function(exports, require, module...) {...}) wrapper containing the entire application. Some quick stats on what's inside:

source
Total size:              10.0 MB (18 MB beautified)
Lines:                   438,162 (beautified)
Named functions:         ~12,400
Arrow functions:         ~21,000
String literals (>20ch): ~25,700
require() calls:         683 total, 42 unique modules

Over 33,000 functions in a 10 MB file. The high ratio of arrow functions to declarations is typical of modern TypeScript compiled. At least that's what Claude tells me.


What's Inside the Bundle

With the beautified source in hand, grep reveals a lot about the application's internals. String literals are there, so we can extract API endpoints, dependency names, model id's, and configuration values directly.

Bundled dependencies

The require() calls and package name strings reveal the major libraries:

source
ink                      Terminal UI (React-for-terminal)
zod                      Schema validation
@anthropic-ai/claude-code   Core application
@azure/msal-common       Azure auth (for Azure OpenAI)
@aws-sdk/core            AWS SDK (for Bedrock)
commander                CLI argument parsing
marked                   Markdown rendering
highlight.js             Syntax highlighting
@grpc/proto-loader       gRPC (for Google Vertex AI)
ws                       WebSocket client

The dependency list reveals a multi-provider architecture. Claude Code supports Anthropic's API directly, AWS Bedrock (via @aws-sdk), Google Vertex AI (via gRPC), and Azure (via MSAL). Each provider has its own auth flow bundled in. The terminal UI is built on Ink, which is React for the terminal. Because ncurse and more modern versions like Ratatui were too simple of course!

API endpoints

source
$ grep -oP 'https://api\.anthropic\.com[^"]+' claude_bundle.beautified.js | sort -u
https://api.anthropic.com
https://api.anthropic.com/api/hello                       ← auth check / ping
https://api.anthropic.com/api/claude_cli_feedback
https://api.anthropic.com/api/claude_code/link_vcs_account
https://api.anthropic.com/api/claude_code/metrics
https://api.anthropic.com/api/oauth/claude_cli/create_api_key
https://api.anthropic.com/api/oauth/claude_cli/roles

OAuth flow:
https://claude.ai/oauth/authorize
https://platform.claude.com/v1/oauth/token

Model catalog

The bundle contains identifiers for every Claude model the client knows about:

source
claude-3-haiku, claude-3-opus, claude-3-sonnet
claude-3-5-haiku-20241022, claude-3-5-sonnet-20241022
claude-3-7-sonnet-20250219
claude-4-opus-20250514
claude-opus-4-1-20250805
claude-sonnet-4-5-20250929
claude-opus-4-5-20251101
claude-haiku-4-5-20251001
claude-opus-4-6                      ← latest at time of analysis

Plus a mysterious reference in the dynamic config: "capableModel": "dodo-v5-prod". An internal codename.

Feature flags (Statsig cache)

The binary reads from ~/.claude/statsig/ at startup. The cached evaluations there reveal the feature flag system. The config names are stored as DJB2 hashes (numeric IDs like 3239527534) rather than plaintext, but the values inside each config are readable JSON. Some interesting configs we found:

source
minimumMessageTokensToInit: 140000    When to write auto-memory
toolCallsBetweenUpdates: 5            Memory update frequency
minTimeBeforeFeedbackMs: 600000       Wait 10min before asking for feedback
start_in_plan_mode: false             Default plan mode state
capableModel: "dodo-v5-prod"          Internal model codename
tokenThreshold: 0.92                  Context window usage trigger
maxExportBatchSize: 200               Telemetry batch settings
fallback_available_warning_threshold: 0.5

And the spinner status words are fetched from a config too: ["Accomplishing","Actioning","Actualizing","Baking",...]. Those random verbs you see while Claude is thinking are server-configured.

Telemetry stack

Grepping for analytics URLs:

source
https://api.anthropic.com/api/claude_code/metrics   Anthropic's own metrics
https://api.segment.io                               Segment (event pipeline)
https://cdn.growthbook.io                            GrowthBook (feature flags)
https://http-intake.logs.us5.datadoghq.com/api/v2   Datadog (logging)
https://beacon.claude-ai.staging.ant.dev             Staging beacon (disabled)

Four analytics services in production: Statsig, Segment, Datadog, and GrowthBook. The telemetry reports session metadata, model usage, tool usage, and permission events. It fires after the response is delivered, so it doesn't affect perceived latency.

A fake User-Agent

One more thing strings turned up:

source
$ strings claude | grep "bun/" | head -1
bun/1.3.9 npm/? node/v24.3.0 linux x64

The binary claims to be node/v24.3.0 in its User-Agent string. This is probably for compatibility with servers and registries that check the Node version. We saw that it's Bun 1.3.9, but it says Node v24.3.0 when making HTTP requests.


438,000 lines of mangled JavaScript. How do you actually find anything?

1. String literals survived. Error messages, CLI flag names, API URLs, telemetry events. Searching for "--version" takes you directly to the version handler. Searching for "api.anthropic.com" finds the API client.

2. Module export patterns are consistent. The bundler registers exports with a pattern like fA(moduleObj, { exportName: () => mangledFn }). So grep "fA(c5B" reveals which functions module c5B exports.

3. The last line is the entrypoint. The final line of the bundle is what runs when the file is evaluated:

source
Line 438162:  Au1();

From there, I traced the call chain by searching for function definitions and following export registrations:

source
$ grep -n "function Au1" claude_bundle.beautified.js     → line 438106 (CLI dispatcher)
$ grep -n "fA(c5B" claude_bundle.beautified.js            → line 436477 (main → rk1)
$ grep -n "function rk1" claude_bundle.beautified.js      → line 436731 (signal setup)
$ grep -n "function tk1" claude_bundle.beautified.js      → line 436797 (Commander.js)
$ grep -n "async function\* Hz" claude_bundle.beautified.js → line 338374 (the loop)

Each search anchored on something concrete: a known string ("--version"), an export pattern (fA(c5B), or a structural hint (the for await...of Hz pattern found in the REPL component). The mangled names are meaningless, but you don't need names when you have strings and patterns!


The Architecture

The call chain we traced is:

source
Au1()  CLI dispatcher: handles --version, --mcp-cli, --ripgrep
  → rk1()  Signal handlers, client type detection
    → tk1()  Commander.js option parsing, action handler
      → Hz()  The core conversation loop (async generator)

The interesting part is Hz at line 338374. This is the core agentic loop, and it's surprisingly simple:

js
async function* Hz({ messages, systemPrompt, tools, ... }) {
    while (true) {
        // 1. Compact context if approaching the limit
        let compacted = await CxD(messages, ...);

        // 2. Stream an API request
        for await (let event of BVH({ messages, systemPrompt, tools })) {
            yield event;  // stream each token to UI
        }

        // 3. If the model used tools, execute them and loop
        let toolUses = response.content.filter(c => c.type === "tool_use");
        if (!toolUses.length) return;  // no tools = done

        // 4. Execute tools locally, append results, loop back
        for await (let result of executeTools(toolUses, ...)) {
            yield result;
        }
        messages = [...messages, ...assistantMsgs, ...toolResults];
    }
}

Send messages, stream response, if tool calls then execute them locally and loop. Despite 438K lines of code, the core loop is four steps.

The client-server split

This is arguably the most important insight. The Anthropic API is stateless. It has no memory between requests. Every single API call sends the entire conversation from the beginning: every user message, every assistant response, every tool result, every thinking block.

The server doesn't know it's in a loop. It doesn't know if this is turn 1 or turn 50. It just sees "here's a conversation, respond to it." The while(true) loop, the decision to continue or stop, the tool execution: all of that lives on your machine.

Here's what each side is responsible for:

source
CLIENT (your machine):                SERVER (api.anthropic.com):
  The agentic loop itself               Model inference
  All tool execution (Bash,             Tool selection (which tools to call)
    Read, Edit, Write, Glob, Grep)      Streaming response
  System prompt assembly (~80KB)        Rate limiting & auth
  Permission management                 That's it. Stateless.
  Context compaction
  Session persistence (~/.claude/)
  The terminal UI (React/Ink)

Every tool runs locally. When Claude says "run npm test", your machine runs npm test. The output is captured and sent back in the next API call as a tool_result. The server never sees your files directly. It only sees what the client sends back.

What goes over the wire

Each API request contains:

source
System prompt:     ~20-30K tokens  (12+ instruction sections)
CLAUDE.md context: ~5-15K tokens   (project/user instructions)
Git status:        ~1-3K tokens    (branch, recent commits)
Tool schemas:      ~30-40K tokens  (JSON Schema for ~15-25 tools)
User message:      ~0.1K tokens    ("fix the bug")
────────────────────────────────────────────────
First turn total:  ~20K tokens (most cached from prior sessions)
Turn 10:           ~50K+ tokens (growing with conversation history)

The tool schemas are the largest component. The Bash tool's schema alone includes the complete commit convention instructions as its description field. And all of this is re-sent every single turn.


How "Unlimited" Context Works

Since the full conversation is re-sent every turn, the payload grows monotonically. Eventually you hit the context window limit. Claude Code's solution is a multi-layered system.

Token estimation

The client doesn't ship a real tokenizer. It uses API usage stats from the last response when available, and falls back to a character-based estimate. The estimation formula at the bottom of the chain:

js
function mE(H, $ = 4) {
    return Math.round(H.length / $)
}

Characters divided by 4. That's it. One token is approximately 4 characters. For JSON content, a variant uses divisor 2. It's rough, but fast, and the real API token count corrects drifts.

The threshold chain

The function Zd (line 332109) evaluates the token count against four escalating levels. The constants are hardcoded:

source
For Opus 4.6 with 200K context window:

  Effective window:   200,000 - 20,000 (output reserve) = 180,000 tokens
  Warning threshold:  ~147,000 tokens   → status bar shows context usage
  Auto-compact:       ~167,000 tokens   → compaction fires
  Blocking limit:     ~197,000 tokens   → conversation terminates

The specific constants from the source:

js
var xX1 = 20000,   // output token cap for effective window
    zSA = 13000,   // auto-compact margin below effective window
    SX1 = 20000,   // warning threshold buffer
    jX1 = 20000,   // error threshold buffer
    NSA = 3000;    // blocking limit safety buffer

Layer 1: Micro-compaction

Runs every turn, no API call needed. It identifies old tool results from read-only tools (Read, Grep, Glob), keeps the 3 most recent intact, and replaces the rest with placeholders. The original content is persisted to a file on disk, so it can be re-read if needed. This helps removes the biggest source of bloat without losing anything permanently.

The specific budget: micro-compaction only fires if total estimated tokens from compactable results exceeds 40,000 and savings would exceed 20,000 tokens. The 3 most recent results are always protected.

Layer 2: Auto-compaction

When the token count crosses the auto-compact threshold (~167K tokens), the client makes a separate API call to summarize the entire conversation. The compaction prompt (yJA, line 168135) is about 3,000 characters and asks for a structured 8-section summary: primary request, key technical concepts, files with code snippets, errors and fixes, problem solving steps, all user messages verbatim, pending tasks, and current work with direct quotes.

After compaction, the conversation is replaced with the summary (~15K tokens) plus re-reads of recently accessed files. Then growth resumes. This creates a sawtooth pattern:

source
Tokens
(messages)
  142K ┤                    ╱│
       │                   ╱ │
       │                  ╱  │ compaction
  100K ┤                 ╱   │ fires
       │                ╱    │
       │               ╱     │
       │              ╱      │
   50K ┤             ╱       │         ╱
       │            ╱        │        ╱
       │           ╱         │       ╱
       │          ╱          ╱      ╱
   20K ┤     ╱╱╱╱           │╱╱╱╱╱╱
       │ ╱╱╱╱
    0  ┼─────┬──────┬───────┬──────┬───────
       1     5      10      15     20   Turns

Growth rate depends on tool usage:
  Chat-heavy:     ~500 tokens/turn  → ~280 turns before compact
  Some tools:    ~5,000 tokens/turn → ~28 turns before compact
  Heavy agentic: ~20,000 tokens/turn → ~7 turns before compact

The tradeoff is that compaction is lossy. Subtle details from early in the conversation can be lost. Long sessions may compact multiple times, each time losing fidelity.

Prompt caching makes it affordable

The "re-send everything" model sounds expensive, but prompt caching changes the math. The client marks cache breakpoints on the last 3 messages. On each turn, most of the payload is identical to the previous turn and gets a cache hit at about 10% of the normal input token cost.

From our real trace, the first-turn breakdown was:

source
Total input: 20,757 tokens
  Cache read:     18,077 (87%)  ← from a recent session
  Cache creation:  2,677 (13%)
  Fresh input:         3 (<1%)  ← the actual user message
Output:                1        ← "hi"

87% of the input was cached. Less than 1% was the actual new content. That's how a 20K-token-per-turn architecture stays viable.

Patching the binary

Here's where two findings combine interestingly. We established earlier that the embedded JS bundle has no integrity check. And we just saw that the context management system is controlled by hardcoded constants in that same JS.

Since the source text sits as plaintext inside the binary, you can edit it in-place with a hex editor. The constants are right there:

source
What you could change:
  mE's divisor (4)        → Change to 8: halves the token estimate,
                            doubles how long before compaction fires
  xX1 (20000)             → Change output reserve
  zSA (13000)             → Change auto-compact margin
  NSA (3000)              → Change blocking limit buffer
  PX1 (40000)             → Change micro-compaction trigger

You'd need to be careful about keeping the byte count identical (pad with spaces or adjust nearby whitespace) so the bytecode offsets don't shift. But the source section is what gets parsed at runtime on the first execution, so changing it does take effect.

A caveat on the mE divisor: its impact is limited. The client is smarter than pure estimation. The function Zw (line 167141) walks backwards through messages to find the most recent assistant response with real API usage data (the usage field returned by the API), and uses that as the base count. mE only estimates the delta: new messages added since the last API response. After the first turn, the real count dominates. The threshold constants (xX1, zSA, NSA) are more effective targets since they shift the decision boundaries directly.

This is possible because the Bun SEA format stores the source as a flat byte range appended to the ELF. There is no checksum on the source section. The bytecode section has its own structure but the runtime will fall back to re-parsing the source if the bytecode is stale or invalid. The binary doesn't verify itself.

Environment variable overrides

You don't actually need to patch the binary. Digging into the compaction logic reveals built-in environment variables that control the same thresholds at runtime:

source
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=95     Compact at 95% of effective window
                                        instead of the default ~93%
CLAUDE_CODE_BLOCKING_LIMIT_OVERRIDE=199000  Raise the hard stop
DISABLE_AUTO_COMPACT=1                  No compaction at all (risky)
CLAUDE_CODE_MAX_OUTPUT_TOKENS=16384     Shrink the output reserve,
                                        giving more room for input

These are checked at runtime with no validation beyond basic number parsing. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE sets the auto-compact threshold as a percentage of the effective window (capped at effectiveWindow - 13000). DISABLE_AUTO_COMPACT=1 disables compaction entirely, letting the conversation grow until it hits the blocking limit, at which point you lose the session. Use with caution.


Let's patch the binary. Copy it, replace every occurrence of the version string with a different value (same byte count so nothing shifts), and run it:

source
$ cp claude claude_patched

$ grep -c '2\.1\.38' claude_patched
169

$ sed -i 's/2\.1\.38/louis_/g' claude_patched

$ ls -lh claude_patched
-rwxr-xr-x 223M   ← identical file size

$ grep -c '9\.9\.99' claude_patched
169                ← all 169 occurrences replaced

$ ./claude_patched --version
louis_ (Claude Code)

The copyright header inside the binary now reads // Version: louis_, the build metadata says VERSION:"louis_", and --help works normally. The binary executes the modified JavaScript without complaining, yay!

The same technique could change any string constant in the embedded source: API endpoints, telemetry URLs, permission prompts, model names, threshold values. As long as the replacement is the same byte length, the SEA trailer stays valid and the runtime parses the modified source on startup.

Session transcripts log everything. Every message, tool output, file contents, and thinking block is written to ~/.claude/projects/ as plaintext JSON. File permissions are 0600, but if the model reads a file containing credentials, those credentials are now in the transcript.

Hooks run arbitrary code. Project-level .claude/settings.json can define hooks that execute shell commands on every tool invocation. These are gated behind workspace trust, but once granted, they run automatically. A malicious repo could include hook definitions alongside its CLAUDE.md.

The sandbox has escape hatches. Bash commands run in bubblewrap on Linux (network and PID namespace isolation, seccomp filters). But the model can request unsandboxed execution via a dangerouslyDisableSandbox parameter, and in bypassPermissions mode that request is auto-accepted.

The security model is fundamentally trust-based. It works well when you trust your repos and your machine. The load-bearing decision is the workspace trust prompt, and developers tend to click "yes" without much thought.


Conclusion

The methodology we used was mostly:

  1. file, ldd, readelf, nm, strings to understand the binary without running it
  2. compendium to trace syscalls at runtime and see every file open, network connection, and memory allocation
  3. grep -boa and xxd to find the embedded JS payload boundary
  4. dd to extract it, js-beautify to make it readable
  5. grep on string literals and export patterns to trace through 438K lines of mangled code

A lot of this is still somewhat speculative. Hopefully you learned something!

Uploaded 14-02-2026