Reverse Engineering Claude Code
For educational purposes only. Claude Code version 2.1.38, on x86-64 Linux (Ubuntu).
Introduction
What does Claude do on our machines? What does it do at startup? How does the agentic loop work? What does the client actually do? What can we find with standard tooling? Can we use the excellent compendium to help us?
If you've ever been curious about poking at some binary on Linux, this is a decent blueprint.
Tools used: file, ldd,
readelf, nm, strings,
xxd, grep, dd,
and
compendium
(my syscall tracer).
Static Analysis: What Is This Binary?
We have an unkown bin, we want to know it's (potentially dirty) secrets. Lets try to peel each layer.
Step 1: Find and identify it
which resolves a command name to a path.
file reads magic bytes and ELF headers to classify it.
ls -lh gives you the size.
$ which claude /home/louis/.local/bin/claude $ file $(which claude) symbolic link to /home/louis/.local/share/claude/versions/2.1.38 $ file /home/louis/.local/share/claude/versions/2.1.38 ELF 64-bit LSB executable, x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped $ ls -lh /home/louis/.local/share/claude/versions/2.1.38 -rwxr-xr-x 223M louis 10 Feb 08:35 2.1.38
Right away, a few things stand out. It's a symlink into a versioned directory, which suggests an auto-update mechanism that swaps symlinks. It's a native ELF 64-bit executable, not a script or some sort of JAR. It's not stripped, meaning symbol names are still present (this will help us later). And it's 223 MB. For reference, the entire coreutils package (ls, cp, mv, cat, etc.) is about 15 MB. Something pretty dang large is embedded inside.
Step 2: Check shared library dependencies
ldd prints what shared libraries a dynamically-linked
ELF depends on. This tells you what the binary
doesn't bundle.
$ ldd /home/louis/.local/share/claude/versions/2.1.38
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
/lib64/ld-linux-x86-64.so.2
Only glibc basics. No libssl, no libcurl,
no libnode, no libstdc++. The TLS stack,
HTTP client, C++ runtime, and JavaScript engine (spoiler) are all
statically linked. This is what makes it 223 MB and
fully self-contained.
The presence of libdl is interesting: it means the
binary can dlopen() shared objects at runtime. This is
how N-API native modules work in a single-executable build.
Step 3: Read the ELF sections
readelf -SW lists section headers: named regions of the
binary with specific purposes. The -S flag asks for
sections, W avoids column truncation.
$ readelf -SW /home/louis/.local/share/claude/versions/2.1.38 [13] .rodata 39.77 MB Read-only data [14] .text 56.79 MB Executable machine code [25] __DATA,__jsc_opcodes 0.02 MB JavaScriptCore opcode tables [26] __DATA,__wtf_config 0.02 MB WebKit config
The section names __DATA,__jsc_opcodes and
__DATA,__wtf_config use Mach-O naming conventions
(the __DATA,__name format comes from macOS). Finding
these in a Linux ELF is a fingerprint: they come from WebKit/JSC
source code that uses
__attribute__((section())) with macOS-style names.
Now for the key observation: all ELF sections total about 97 MB. But the file is 223 MB. That means roughly 115 MB of data is appended after the ELF structure. Hmmm... we'll come back to this.
Step 4: Identify the runtime with dynamic symbols
nm -D lists dynamic symbols: functions the binary
exports or imports. For a language runtime, the exports identify
exactly what's inside.
$ nm -D /home/louis/.local/share/claude/versions/2.1.38 | grep "@@BUN" | head -3 00000000032bcc20 T napi_acquire_threadsafe_function@@BUN_1.2 000000000431d540 T napi_add_async_cleanup_hook@@BUN_1.2 000000000431cf20 T napi_add_env_cleanup_hook@@BUN_1.2 $ nm -D /home/louis/.local/share/claude/versions/2.1.38 | grep -c "@@BUN" 556
556 symbols versioned as @@BUN_1.2. These are N-API
functions provided by Bun's runtime, not Node's. That
version tag is the proof: this binary is a
Bun application. Whatever this means.
Step 5: Extract version strings
strings pulls printable sequences from a binary. It's
a blunt tool, but for identifying what a binary is and what
it talks to, it's quick and easy to use.
$ strings /home/louis/.local/share/claude/versions/2.1.38 | grep -i "bun v" Bun v1.3.9-canary.51+d5628db23 (Linux x64 baseline)
Bun v1.3.9-canary.51, a pre-release build. The "baseline" tag means it's compiled for generic x86-64 without AVX2 or other specific cpu flags, probably for maximum compatibility.
Step 6: Identify the compiler toolchain
The .comment ELF section is populated by compilers
with their version strings.
$ readelf -p .comment /home/louis/.local/share/claude/versions/2.1.38 Linker: Ubuntu LLD 21.1.5 GCC: (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0 rustc version 1.94.0-nightly (c61a3a44d 2025-12-09) Ubuntu clang version 21.1.5
Four compilers contributed to this binary: Clang for the JSC C++ code, GCC for vendored C libraries, Rust nightly for native components (Bun uses Rust for parts of its I/O stack as I found out online, although this is somewhat speculative), and LLD as the linker.
What we know before running it
Without executing a single instruction, we've established: it's a Bun v1.3.9-canary.51 single-executable application using JavaScriptCore. The 97 MB ELF contains the native runtime, and 115 MB of data is appended after the ELF sections. It depends only on glibc and is fully self-contained.
Dynamic Analysis: Watching It Run
Now we run it and watch what happens. I used
compendium
with --verbose (raw syscalls) and -o
(save to file). Two traces:
$ compendium --verbose -o traces/claude_help.log -- claude --help $ compendium --verbose -o traces/claude_print.log -- claude -p "say just the word hi"
The --help trace is interesting because it's a minimal
invocation with no network. Here's the summary:
FINAL: claude --help (0.25s) Memory: Heap: 132.0 KB Mmap: 1.2 GB (67 regions) Total: 1.2 GB I/O: Files read: 115.2 MB Files written: 0 B Net sent: 0 B Net received: 0 B
Just to print help text, it allocates 1.2 GB of virtual memory and reads 115.2 MB from disk. That 115 MB matches exactly the appended payload we found in static analysis. Yay! We just caught the runtime reading its own embedded JavaScript, this confirms what we discovered earlier.
Note that I don't show our tracer's full output since its super verbose (all syscalls are intercepted..). A real prompt is heavier:
FINAL: claude -p say just the word hi, nothing else (3.80s)
Memory:
Mmap: 3.3 GB (760 regions)
I/O:
Files read: 373.5 MB
Files written: 9.6 MB
Net sent: 202.0 KB
Net received: 42.7 KB
Files:
Opened for read: 202
Opened for write: 13
Network connections:
tcp4 → 160.79.104.10:443 (Anthropic API)
tcp4 → 140.82.113.3:22 (GitHub SSH)
tcp4 → 34.149.66.137:443 (Telemetry/Statsig)
Subprocesses: ["uname", "rust-analyzer", "grep", "2.1.38",
"ps", "ssh", "git", "sh", "which"]
202 file opens, 9 subprocess types, 3 network connections. Just to answer "hi." Let me break down the startup timeline from the verbose log.
The startup timeline
0 to 1ms: Dynamic linker. Loads the five shared libraries.
2 to 47ms: Bun/JSC initialization. This is where the heavy lifting happens. The runtime checks the kernel's overcommit policy, allocates a 1 GB JSC heap region, a 128 GB virtual reservation for the JIT compiler, and most importantly:
[+0.047s] read 115.1 MB from exe ← the entire JS bundle, one read
That single read takes about 43ms and is the dominant startup cost. The runtime reads its own executable to extract the embedded JavaScript. This is the Bun SEA (Single Executable Application) mechanism at work.
47 to 55ms: Thread pool. Bun spins up 8 worker threads immediately after JSC init, then 3 more later for JIT compilation and GC.
55 to 240ms: GC pressure monitoring. A striking pattern:
the main thread polls /proc/self/statm roughly 100
times while JSC parses and compiles the 115 MB bundle. This is
the garbage collector deciding whether to trigger collection during
the heavy initialization phase.
245 to 290ms: Configuration loading. Here's where "Claude Code the application" shows up rather than "Bun the runtime":
[+0.245s] open ~/.claude/settings.json (read) [+0.249s] open .claude/settings.local.json [+0.251s] open ~/.claude/.credentials.json [+0.260s] open ~/.claude.json (read) [+0.268s] open ~/.claude/plugins/known_marketplaces.json [+0.291s] open CLAUDE.md (read)
Notice that settings.json gets opened 5 times in the
same millisecond. Multiple modules requesting the same config
without a shared mapping/cache.
290ms+: Network and subprocesses. Three connections go
out in parallel: the Anthropic API (for the actual prompt), GitHub
via SSH (a git fetch to check for remote changes), and
a telemetry endpoint on Google Cloud (Statsig). The binary also
spawns itself twice to compile native addon caches, launches
rust-analyzer because we're in a Rust project when
running this (I guess(?)), and
runs which three times to probe for available tools.
The network numbers are interesting for their asymmetry: we sent 5x more than we received (202 KB vs 42.7 KB) for a trivial "say hi" prompt. The request payload (~80 KB) dwarfs the response because it includes the full system prompt, CLAUDE.md instructions, git state, and tool schemas. For one word of output, 80 KB of context goes in.
Extracting the JavaScript Bundle
We know 115 MB is appended after the ELF sections. We watched the runtime read all of it at startup. Now we extract it.
Finding the boundary
grep -boa does a byte-offset, binary-safe search.
We look for the Bun module header signature:
$ grep -boa '// @bun @bytecode @bun-cjs' claude 78242:// @bun @bytecode @bun-cjs ← in .rodata (Bun's own modules) 101962616:// @bun @bytecode @bun-cjs ← START OF SEA PAYLOAD 197232544:// @bun @bytecode @bun-cjs ← second copy (different bytecode tier) 207706904:// @bun @bytecode @bun-cjs ← small shim modules
The payload starts at offset 101,962,616. We can peek at it with
xxd:
$ xxd -s 0x613d330 -l 80 claude file:///$bunfs/root/src/entrypoints/cli.js // @bun @bytecode @bun-cjs (function(exports, require, module...
The $bunfs virtual filesystem path tells us this is
the main CLI entrypoint. The bundled code thinks it's reading from
a filesystem, but it's actually loaded from the appended blob.
How much is readable source?
The @bytecode annotation means each module contains
both JavaScript source text and pre-compiled JSC bytecode. We need
to figure out where the source ends and the bytecode begins.
The approach: scan the payload in 1 MB chunks and check what percentage of each chunk is printable ASCII. Source code is 100% printable. Bytecode is not. The transition clear (kind of):
$ dd if=claude bs=1 skip=101962616 count=$((222697790 - 101962616)) | \
python3 -c "scan 1MB chunks, report printable %"
Offset 0MB: 100.0% printable
Offset 5MB: 100.0% printable
Offset 10MB: 17.5% printable ← transition to bytecode
First non-printable byte at offset 10,474,337 (10.0 MB into payload)
10 MB of readable source, then the remaining 80+ MB drops to 17.5% printable (that's JSC bytecode with interspersed string constants for stack traces).
We can cross-verify this with Module 2. Remember the
grep -boa output showed a second copy at offset
197,232,544. Module 3 starts at 207,706,904, making Module 2
exactly 10,474,360 bytes. Our extraction from Module 1 is
10,474,337 bytes. The 23-byte difference some metadata header
before the source text. The sizes match, confirming we have the
source! Module 2 is an independent copy at a
different bytecode compilation tier, for optimization purposes.
Looking online people claim that Bun is very fast (for a JS "runtime"),
well that's partly how they do it.
Extracting and beautifying
$ dd if=claude bs=1 skip=101962616 count=10474337 > claude_bundle.js $ wc -l claude_bundle.js 7,539 lines $ pip3 install jsbeautifier $ js-beautify -f claude_bundle.js -o claude_bundle.beautified.js $ wc -l claude_bundle.beautified.js 438,162 lines
7,539 minified lines become 438,162 lines of properly
indented, readable JavaScript. Variable names are still mangled
(Au1, Hz, Px) but the
control flow, string constants, and structure are all navigable.
The file opens with a copyright notice and the exact build metadata:
// @bun @bytecode @bun-cjs (function(exports, require, module, __filename, __dirname) { // Claude Code is a Beta product per Anthropic's Commercial Terms of Service. // By using Claude Code, you agree that all code acceptance or rejection // decisions you make, and the associated conversations in context, // constitute Feedback under Anthropic's Commercial Terms, and may be used // to improve Anthropic's products, including training models. // You are responsible for reviewing any code suggestions before use. // (c) Anthropic PBC. All rights reserved. // Version: 2.1.38 var AFB=Object.create;var{getPrototypeOf:LFB,defineProperty:vHH,...
It's a js bundle: one giant
(function(exports, require, module...) {...}) wrapper
containing the entire application. Some quick stats on what's
inside:
Total size: 10.0 MB (18 MB beautified) Lines: 438,162 (beautified) Named functions: ~12,400 Arrow functions: ~21,000 String literals (>20ch): ~25,700 require() calls: 683 total, 42 unique modules
Over 33,000 functions in a 10 MB file. The high ratio of arrow functions to declarations is typical of modern TypeScript compiled. At least that's what Claude tells me.
What's Inside the Bundle
With the beautified source in hand, grep reveals a
lot about the application's internals. String literals are
there, so we can extract API endpoints, dependency names,
model id's, and configuration values directly.
Bundled dependencies
The require() calls and package name strings reveal
the major libraries:
ink Terminal UI (React-for-terminal) zod Schema validation @anthropic-ai/claude-code Core application @azure/msal-common Azure auth (for Azure OpenAI) @aws-sdk/core AWS SDK (for Bedrock) commander CLI argument parsing marked Markdown rendering highlight.js Syntax highlighting @grpc/proto-loader gRPC (for Google Vertex AI) ws WebSocket client
The dependency list reveals a multi-provider architecture. Claude Code supports Anthropic's API directly, AWS Bedrock (via @aws-sdk), Google Vertex AI (via gRPC), and Azure (via MSAL). Each provider has its own auth flow bundled in. The terminal UI is built on Ink, which is React for the terminal. Because ncurse and more modern versions like Ratatui were too simple of course!
API endpoints
$ grep -oP 'https://api\.anthropic\.com[^"]+' claude_bundle.beautified.js | sort -u https://api.anthropic.com https://api.anthropic.com/api/hello ← auth check / ping https://api.anthropic.com/api/claude_cli_feedback https://api.anthropic.com/api/claude_code/link_vcs_account https://api.anthropic.com/api/claude_code/metrics https://api.anthropic.com/api/oauth/claude_cli/create_api_key https://api.anthropic.com/api/oauth/claude_cli/roles OAuth flow: https://claude.ai/oauth/authorize https://platform.claude.com/v1/oauth/token
Model catalog
The bundle contains identifiers for every Claude model the client knows about:
claude-3-haiku, claude-3-opus, claude-3-sonnet claude-3-5-haiku-20241022, claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 claude-4-opus-20250514 claude-opus-4-1-20250805 claude-sonnet-4-5-20250929 claude-opus-4-5-20251101 claude-haiku-4-5-20251001 claude-opus-4-6 ← latest at time of analysis
Plus a mysterious reference in the dynamic config:
"capableModel": "dodo-v5-prod". An internal codename.
Feature flags (Statsig cache)
The binary reads from ~/.claude/statsig/ at startup.
The cached evaluations there reveal the feature flag system. The
config names are stored as DJB2 hashes (numeric IDs like
3239527534) rather than plaintext, but the values
inside each config are readable JSON. Some interesting configs
we found:
minimumMessageTokensToInit: 140000 When to write auto-memory toolCallsBetweenUpdates: 5 Memory update frequency minTimeBeforeFeedbackMs: 600000 Wait 10min before asking for feedback start_in_plan_mode: false Default plan mode state capableModel: "dodo-v5-prod" Internal model codename tokenThreshold: 0.92 Context window usage trigger maxExportBatchSize: 200 Telemetry batch settings fallback_available_warning_threshold: 0.5
And the spinner status words are fetched from a config too:
["Accomplishing","Actioning","Actualizing","Baking",...].
Those random verbs you see while Claude is thinking are
server-configured.
Telemetry stack
Grepping for analytics URLs:
https://api.anthropic.com/api/claude_code/metrics Anthropic's own metrics https://api.segment.io Segment (event pipeline) https://cdn.growthbook.io GrowthBook (feature flags) https://http-intake.logs.us5.datadoghq.com/api/v2 Datadog (logging) https://beacon.claude-ai.staging.ant.dev Staging beacon (disabled)
Four analytics services in production: Statsig, Segment, Datadog, and GrowthBook. The telemetry reports session metadata, model usage, tool usage, and permission events. It fires after the response is delivered, so it doesn't affect perceived latency.
A fake User-Agent
One more thing strings turned up:
$ strings claude | grep "bun/" | head -1 bun/1.3.9 npm/? node/v24.3.0 linux x64
The binary claims to be node/v24.3.0 in its
User-Agent string. This is probably for compatibility with servers and
registries that check the Node version. We saw that it's Bun 1.3.9,
but it says Node v24.3.0 when making HTTP requests.
Minified Code
438,000 lines of mangled JavaScript. How do you actually find anything?
1. String literals survived. Error messages,
CLI flag names, API URLs, telemetry events. Searching for
"--version" takes you directly to the version handler.
Searching for "api.anthropic.com" finds the API
client.
2. Module export patterns are consistent. The bundler
registers exports with a pattern like
fA(moduleObj, { exportName: () => mangledFn }).
So grep "fA(c5B" reveals which functions module
c5B exports.
3. The last line is the entrypoint. The final line of the bundle is what runs when the file is evaluated:
Line 438162: Au1();
From there, I traced the call chain by searching for function definitions and following export registrations:
$ grep -n "function Au1" claude_bundle.beautified.js → line 438106 (CLI dispatcher) $ grep -n "fA(c5B" claude_bundle.beautified.js → line 436477 (main → rk1) $ grep -n "function rk1" claude_bundle.beautified.js → line 436731 (signal setup) $ grep -n "function tk1" claude_bundle.beautified.js → line 436797 (Commander.js) $ grep -n "async function\* Hz" claude_bundle.beautified.js → line 338374 (the loop)
Each search anchored on something concrete: a known string
("--version"), an export pattern (fA(c5B),
or a structural hint (the for await...of Hz pattern
found in the REPL component). The mangled names are meaningless,
but you don't need names when you have strings and patterns!
The Architecture
The call chain we traced is:
Au1() CLI dispatcher: handles --version, --mcp-cli, --ripgrep
→ rk1() Signal handlers, client type detection
→ tk1() Commander.js option parsing, action handler
→ Hz() The core conversation loop (async generator)
The interesting part is Hz at line 338374. This is
the core agentic loop, and it's surprisingly simple:
async function* Hz({ messages, systemPrompt, tools, ... }) { while (true) { // 1. Compact context if approaching the limit let compacted = await CxD(messages, ...); // 2. Stream an API request for await (let event of BVH({ messages, systemPrompt, tools })) { yield event; // stream each token to UI } // 3. If the model used tools, execute them and loop let toolUses = response.content.filter(c => c.type === "tool_use"); if (!toolUses.length) return; // no tools = done // 4. Execute tools locally, append results, loop back for await (let result of executeTools(toolUses, ...)) { yield result; } messages = [...messages, ...assistantMsgs, ...toolResults]; } }
Send messages, stream response, if tool calls then execute them locally and loop. Despite 438K lines of code, the core loop is four steps.
The client-server split
This is arguably the most important insight. The Anthropic API is stateless. It has no memory between requests. Every single API call sends the entire conversation from the beginning: every user message, every assistant response, every tool result, every thinking block.
The server doesn't know it's in a loop. It doesn't know if this is
turn 1 or turn 50. It just sees "here's a conversation, respond to
it." The while(true) loop, the decision to continue or
stop, the tool execution: all of that lives on your machine.
Here's what each side is responsible for:
CLIENT (your machine): SERVER (api.anthropic.com):
The agentic loop itself Model inference
All tool execution (Bash, Tool selection (which tools to call)
Read, Edit, Write, Glob, Grep) Streaming response
System prompt assembly (~80KB) Rate limiting & auth
Permission management That's it. Stateless.
Context compaction
Session persistence (~/.claude/)
The terminal UI (React/Ink)
Every tool runs locally. When Claude says "run
npm test", your machine runs
npm test. The output is captured and sent back in
the next API call as a tool_result. The server never
sees your files directly. It only sees what the client sends back.
What goes over the wire
Each API request contains:
System prompt: ~20-30K tokens (12+ instruction sections)
CLAUDE.md context: ~5-15K tokens (project/user instructions)
Git status: ~1-3K tokens (branch, recent commits)
Tool schemas: ~30-40K tokens (JSON Schema for ~15-25 tools)
User message: ~0.1K tokens ("fix the bug")
────────────────────────────────────────────────
First turn total: ~20K tokens (most cached from prior sessions)
Turn 10: ~50K+ tokens (growing with conversation history)
The tool schemas are the largest component. The Bash tool's schema alone includes the complete commit convention instructions as its description field. And all of this is re-sent every single turn.
How "Unlimited" Context Works
Since the full conversation is re-sent every turn, the payload grows monotonically. Eventually you hit the context window limit. Claude Code's solution is a multi-layered system.
Token estimation
The client doesn't ship a real tokenizer. It uses API usage stats from the last response when available, and falls back to a character-based estimate. The estimation formula at the bottom of the chain:
function mE(H, $ = 4) { return Math.round(H.length / $) }
Characters divided by 4. That's it. One token is approximately 4 characters. For JSON content, a variant uses divisor 2. It's rough, but fast, and the real API token count corrects drifts.
The threshold chain
The function Zd (line 332109) evaluates the token
count against four escalating levels. The constants are hardcoded:
For Opus 4.6 with 200K context window: Effective window: 200,000 - 20,000 (output reserve) = 180,000 tokens Warning threshold: ~147,000 tokens → status bar shows context usage Auto-compact: ~167,000 tokens → compaction fires Blocking limit: ~197,000 tokens → conversation terminates
The specific constants from the source:
var xX1 = 20000, // output token cap for effective window zSA = 13000, // auto-compact margin below effective window SX1 = 20000, // warning threshold buffer jX1 = 20000, // error threshold buffer NSA = 3000; // blocking limit safety buffer
Layer 1: Micro-compaction
Runs every turn, no API call needed. It identifies old tool results from read-only tools (Read, Grep, Glob), keeps the 3 most recent intact, and replaces the rest with placeholders. The original content is persisted to a file on disk, so it can be re-read if needed. This helps removes the biggest source of bloat without losing anything permanently.
The specific budget: micro-compaction only fires if total estimated tokens from compactable results exceeds 40,000 and savings would exceed 20,000 tokens. The 3 most recent results are always protected.
Layer 2: Auto-compaction
When the token count crosses the auto-compact threshold (~167K
tokens), the client makes a separate API call to summarize the
entire conversation. The compaction prompt (yJA, line
168135) is about 3,000 characters and asks for a structured
8-section summary: primary request, key technical concepts, files
with code snippets, errors and fixes, problem solving steps, all
user messages verbatim, pending tasks, and current work with direct
quotes.
After compaction, the conversation is replaced with the summary (~15K tokens) plus re-reads of recently accessed files. Then growth resumes. This creates a sawtooth pattern:
Tokens
(messages)
142K ┤ ╱│
│ ╱ │
│ ╱ │ compaction
100K ┤ ╱ │ fires
│ ╱ │
│ ╱ │
│ ╱ │
50K ┤ ╱ │ ╱
│ ╱ │ ╱
│ ╱ │ ╱
│ ╱ ╱ ╱
20K ┤ ╱╱╱╱ │╱╱╱╱╱╱
│ ╱╱╱╱
0 ┼─────┬──────┬───────┬──────┬───────
1 5 10 15 20 Turns
Growth rate depends on tool usage:
Chat-heavy: ~500 tokens/turn → ~280 turns before compact
Some tools: ~5,000 tokens/turn → ~28 turns before compact
Heavy agentic: ~20,000 tokens/turn → ~7 turns before compact
The tradeoff is that compaction is lossy. Subtle details from early in the conversation can be lost. Long sessions may compact multiple times, each time losing fidelity.
Prompt caching makes it affordable
The "re-send everything" model sounds expensive, but prompt caching changes the math. The client marks cache breakpoints on the last 3 messages. On each turn, most of the payload is identical to the previous turn and gets a cache hit at about 10% of the normal input token cost.
From our real trace, the first-turn breakdown was:
Total input: 20,757 tokens Cache read: 18,077 (87%) ← from a recent session Cache creation: 2,677 (13%) Fresh input: 3 (<1%) ← the actual user message Output: 1 ← "hi"
87% of the input was cached. Less than 1% was the actual new content. That's how a 20K-token-per-turn architecture stays viable.
Patching the binary
Here's where two findings combine interestingly. We established earlier that the embedded JS bundle has no integrity check. And we just saw that the context management system is controlled by hardcoded constants in that same JS.
Since the source text sits as plaintext inside the binary, you can edit it in-place with a hex editor. The constants are right there:
What you could change:
mE's divisor (4) → Change to 8: halves the token estimate,
doubles how long before compaction fires
xX1 (20000) → Change output reserve
zSA (13000) → Change auto-compact margin
NSA (3000) → Change blocking limit buffer
PX1 (40000) → Change micro-compaction trigger
You'd need to be careful about keeping the byte count identical (pad with spaces or adjust nearby whitespace) so the bytecode offsets don't shift. But the source section is what gets parsed at runtime on the first execution, so changing it does take effect.
A caveat on the mE divisor: its impact is limited.
The client is smarter than pure estimation. The function
Zw (line 167141) walks backwards through messages to
find the most recent assistant response with real API usage
data (the usage field returned by the API), and
uses that as the base count. mE only estimates the
delta: new messages added since the last API response. After the
first turn, the real count dominates. The threshold constants
(xX1, zSA, NSA) are more
effective targets since they shift the decision boundaries
directly.
This is possible because the Bun SEA format stores the source as a flat byte range appended to the ELF. There is no checksum on the source section. The bytecode section has its own structure but the runtime will fall back to re-parsing the source if the bytecode is stale or invalid. The binary doesn't verify itself.
Environment variable overrides
You don't actually need to patch the binary. Digging into the compaction logic reveals built-in environment variables that control the same thresholds at runtime:
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=95 Compact at 95% of effective window
instead of the default ~93%
CLAUDE_CODE_BLOCKING_LIMIT_OVERRIDE=199000 Raise the hard stop
DISABLE_AUTO_COMPACT=1 No compaction at all (risky)
CLAUDE_CODE_MAX_OUTPUT_TOKENS=16384 Shrink the output reserve,
giving more room for input
These are checked at runtime with no validation beyond basic
number parsing. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE
sets the auto-compact threshold as a percentage of the effective
window (capped at effectiveWindow - 13000).
DISABLE_AUTO_COMPACT=1 disables compaction entirely,
letting the conversation grow until it hits the blocking limit,
at which point you lose the session. Use with caution.
Let's patch the binary. Copy it, replace every occurrence of the version string with a different value (same byte count so nothing shifts), and run it:
$ cp claude claude_patched $ grep -c '2\.1\.38' claude_patched 169 $ sed -i 's/2\.1\.38/louis_/g' claude_patched $ ls -lh claude_patched -rwxr-xr-x 223M ← identical file size $ grep -c '9\.9\.99' claude_patched 169 ← all 169 occurrences replaced $ ./claude_patched --version louis_ (Claude Code)
The copyright header
inside the binary now reads // Version: louis_,
the build metadata says VERSION:"louis_", and
--help works normally. The binary executes the
modified JavaScript without complaining, yay!
The same technique could change any string constant in the embedded source: API endpoints, telemetry URLs, permission prompts, model names, threshold values. As long as the replacement is the same byte length, the SEA trailer stays valid and the runtime parses the modified source on startup.
Session transcripts log everything. Every message, tool
output, file contents, and thinking block is written to
~/.claude/projects/ as plaintext JSON. File
permissions are 0600, but if the model reads a file containing
credentials, those credentials are now in the transcript.
Hooks run arbitrary code. Project-level
.claude/settings.json can define hooks that execute
shell commands on every tool invocation. These are gated behind
workspace trust, but once granted, they run automatically. A
malicious repo could include hook definitions alongside its
CLAUDE.md.
The sandbox has escape hatches. Bash commands run in
bubblewrap on Linux (network and PID namespace isolation, seccomp
filters). But the model can request unsandboxed execution via a
dangerouslyDisableSandbox parameter, and in
bypassPermissions mode that request is auto-accepted.
The security model is fundamentally trust-based. It works well when you trust your repos and your machine. The load-bearing decision is the workspace trust prompt, and developers tend to click "yes" without much thought.
Conclusion
The methodology we used was mostly:
file,ldd,readelf,nm,stringsto understand the binary without running itcompendiumto trace syscalls at runtime and see every file open, network connection, and memory allocationgrep -boaandxxdto find the embedded JS payload boundaryddto extract it,js-beautifyto make it readablegrepon string literals and export patterns to trace through 438K lines of mangled code
A lot of this is still somewhat speculative. Hopefully you learned something!