Pker.xyz

Autom8: A Claude Orchestrator

rust, tools

Dive into building a robust orchestrator for Claude Code. github.com/louisboilard/autom8


Introduction: Run of the mill LLM Orchestration

Current LLM orchestrator (at least the ones I've been exposed to at the time of writing this, i.e the public and somewhat mainstream ones (yes, you, Ralph)) do more or less the following:

  1. Initial prompt to make a "structured" (markdown/text) plan
  2. Iterate on said prompt until happy with plan
  3. Main loop consists of piping into claude via stdin or asking claude to re-eval what's been implemented vs the plan
  4. Repeat until claude says it's done (or until some max iteration is hit)
  5. Shell scripts of anywhere from 30 to a few thousands lines

The concept is sane but the implementation lacks the things we aim to bring with autom8: type safety, determinism where possible (i.e autom8 does as much work as possible, only leverage Claude when needed), safe re-runs/resuming of interupted work, ratatui powered TUI to monitor on-going runs and investigate/review previous runs, proper error/state types, sane toml based config files, intelligent context injection between agents via git diffs and shared context, concrete agent types, commits/pr creations if desired, context size optimization..

Let's look at how it works.


Architecture Overview

autom8 is structured around three core concepts: specs, state, and knowledge.

A spec defines what needs to be built. It's a JSON file listing user stories with acceptance criteria, priorities, and a pass/fail status for each. The spec is the source of truth for "what work remains."

source
{
  "project": "my-feature",
  "branchName": "feature/add-auth",
  "userStories": [
    {
      "id": "US-001",
      "title": "Add login endpoint",
      "acceptanceCriteria": ["POST /login returns JWT", "Invalid creds return 401"],
      "priority": 1,
      "passes": false
    }
  ]
}

State tracks execution progress. Every transition gets persisted to disk, so the orchestrator can resume from any point. The state includes the current machine state, which story is being worked on, iteration counts, and timestamps.

Knowledge accumulates across the run. When a story completes, autom8 captures what files were modified, what architectural decisions were made, and what patterns were established. This knowledge gets injected into subsequent prompts, giving later agents context about earlier work.

The component relationships look like this:

source
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│      Spec       │     │   RunState      │     │  ProjectKnowledge│
│  (what to do)   │────▶│ (where we are)  │────▶│  (what we know)  │
└─────────────────┘     └─────────────────┘     └───────────────── ┘
         │                      │                        │
         │                      ▼                        │
         │              ┌─────────────────┐              │
         └─────────────▶│   StateManager  │◀─────────────┘
                        │   (persistence) │
                        └─────────────────┘
                                │
                        ┌───────┴───────┐
                        ▼               ▼
                  state.json      runs/*.json
                  (current)       (archived)

The StateManager handles all persistence. Current state lives in a single file that gets updated on every transition. When a run completes, it's archived with a timestamp for later reference.


The State Machine

Current impl is a 12-state machine with explicit transitions. Every state is a discrete phase of execution.

rust
#[derive(Debug, Clone, Copy, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum MachineState {
    Idle,
    LoadingSpec,
    GeneratingSpec,
    Initializing,
    PickingStory,
    RunningClaude,
    Reviewing,
    Correcting,
    Committing,
    #[serde(rename = "creating-pr")]
    CreatingPR,
    Completed,
    Failed,
}

At a higher level there are 4 main steps, the user only runs `autom8` to go through all 4.

First is planning/spec creation: launching autom8 will spawn claude with an injected prompt, where we define what we want to work on, once done we exit, autom8 picks the new spec and converts it to json.

Second phase is implementation: agents will do what the plan told them to.

Thirdly we have a review-correction loop.

Lastly (and optionally): commits and pr creation.

The state flow for a typical run:

source
                         ┌──────────────────┐
                         │       Idle       │
                         └────────┬─────────┘
                                  │ start
                                  ▼
                         ┌──────────────────┐
                         │   LoadingSpec    │
                         └────────┬─────────┘
                                  │
                                  ▼
                         ┌──────────────────┐
                         │  Initializing    │
                         └────────┬─────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   ▼                   │
              │          ┌──────────────────┐         │
              │          │  PickingStory    │◀────────┤
              │          └────────┬─────────┘         │
              │                   │                   │
              │    ┌──────────────┼──────────────┐    │
              │    │ incomplete   │    all done  │    │
              │    ▼              │              │    │
              │ ┌──────────────┐  │              │    │
              │ │ RunningClaude│  │              │    │
              │ └──────┬───────┘  │              │    │
              │        │          │              │    │
              │        ▼          │              │    │
              │ ┌──────────────┐  │              │    │
   issues ──▶ │ │  Reviewing   │  │              │    │
   found      │ └──────┬───────┘  │              │    │
     │        │        │ pass     │              │    │
     │        │        ▼          │              │    │
     │        │ ┌──────────────┐  │              │    │
     │        │ │  Committing  │  │              │    │
     │        │ └──────┬───────┘  │              │    │
     │        │        │          │              │    │
     │        └────────┴──────────┘              │    │
     │                                           │    │
     ▼                                           ▼    │
┌──────────────┐                        ┌──────────────┐
│  Correcting  │                        │  CreatingPR  │
└──────┬───────┘                        └──────┬───────┘
       │                                       │
       └──────────▶ back to Reviewing          ▼
                                        ┌──────────────┐
                                        │  Completed   │
                                        └──────────────┘

State is persisted after every transition.


Cumulative Context Architecture

Each Claude invocation starts fresh with zero knowledge of what happened before. For single-shot tasks this is fine. For complex workflow/implementations where we want to keep relatively low context per agent but share some level of context to avoid doing too much work per agent we need something a little more sophisticated.

Consider implementing three related features: authentication, user profiles, and settings. Story 1 creates the auth module. Story 2 needs to know about that module to build profiles. Story 3 needs to know about both to build settings that integrate with them.

autom8 solves this with the ProjectKnowledge struct:

rust
pub struct ProjectKnowledge {
    /// Known files and their metadata, keyed by path
    pub files: HashMap<PathBuf, FileInfo>,

    /// Architectural decisions made during the run
    pub decisions: Vec<Decision>,

    /// Code patterns established during the run
    pub patterns: Vec<Pattern>,

    /// Changes made for each completed story
    pub story_changes: Vec<StoryChanges>,

    /// Baseline commit hash when the run started
    pub baseline_commit: Option<String>,
}

After each agent completes, we captures knowledge from two sources:

  1. Git diffs: What files were created, modified, or deleted
  2. Agent output: Semantic information the LLM provides about decisions and patterns

The agent is prompted to output structured context blocks:

source
<files-context>
src/auth.rs | JWT authentication module | [authenticate, verify_token]
src/main.rs | Application entry | [main]
</files-context>

<decisions>
Auth method | JWT | Stateless, scalable, well-supported
</decisions>

<patterns>
Use Result<T, AuthError> for auth operations
</patterns>

This gets parsed and merged into the knowledge base:

rust
pub struct StoryChanges {
    pub story_id: String,
    pub files_created: Vec<FileChange>,
    pub files_modified: Vec<FileChange>,
    pub files_deleted: Vec<PathBuf>,
    pub commit_hash: Option<String>,
}

pub struct FileChange {
    pub path: PathBuf,
    pub additions: u32,
    pub deletions: u32,
    pub purpose: Option<String>,
    pub key_symbols: Vec<String>,
}

When the next story starts, autom8 injects this knowledge into the prompt. The agent sees a table of files touched so far, architectural decisions that were made, and patterns it should follow. This context injection is automatic, stories don't need to explicitly reference each other. We have git based non-llm fallbacks for when agents misbehave and don't output file context/decision. Note context injection going from one agent to the next is a sub millisecond op: there's no llm invocation for this.

Between stories, the user might make unrelated changes—updating dependencies, editing configs, etc. autom8 tracks which files it touched via the baseline_commit and filters subsequent diffs to only include "our" changes:

rust
pub fn filter_our_changes(&self, all_changes: &[DiffEntry]) -> Vec<DiffEntry> {
    let our_files = self.our_files();

    all_changes
        .iter()
        .filter(|entry| {
            // Include if it's a new file (we created it)
            if entry.status == DiffStatus::Added {
                return true;
            }
            // Include if we've touched it before
            our_files.contains(&entry.path)
        })
        .cloned()
        .collect()
}

Each story in a run has access to accumulated context from all previous stories, without the noise of unrelated changes. Later agents build on earlier work rather than rediscovering it.


Type-Safe Error Handling

Orchestration code is inherently complex. There are dozens of failure modes: file not found, invalid JSON, git errors, Claude crashes, timeouts, network issues. Handling these with stringly-typed errors or catch-all exceptions leads to bugs.

We use exhaustive enums for all error types, with compiler enforced all variant handling:

rust
use thiserror::Error;

#[derive(Error, Debug)]
pub enum Autom8Error {
    #[error("Spec file not found: {0}")]
    SpecNotFound(PathBuf),

    #[error("Invalid spec format: {0}")]
    InvalidSpec(String),

    #[error("No incomplete stories found in spec")]
    NoIncompleteStories,

    #[error("Claude process failed: {0}")]
    ClaudeError(String),

    #[error("Claude process timed out after {0} seconds")]
    ClaudeTimeout(u64),

    #[error("State file error: {0}")]
    StateError(String),

    #[error("No active run to resume")]
    NoActiveRun,

    #[error("Run already in progress: {0}")]
    RunInProgress(String),

    #[error("IO error: {0}")]
    Io(#[from] std::io::Error),

    #[error("JSON error: {0}")]
    Json(#[from] serde_json::Error),

    #[error("Review failed after 3 iterations")]
    MaxReviewIterationsReached,
    // ... more variants
}
        

Each variant has a human-readable message, and the #[from] attribute enables automatic conversion from underlying error types.

Outcome enums follow the same pattern. Rather than returning booleans or magic strings, operations return typed results:

rust
#[derive(Debug, Clone, PartialEq)]
pub enum ReviewResult {
    Pass,
    IssuesFound,
    Error(ClaudeErrorInfo),
}

#[derive(Debug, Clone, PartialEq)]
pub enum CorrectorResult {
    Complete,
    Error(ClaudeErrorInfo),
}

When handling review results, the match is exhaustive:

rust
match review_result {
    ReviewResult::Pass => {
        // All good, proceed to commit
        state.transition_to(MachineState::Committing);
    }
    ReviewResult::IssuesFound => {
        // Need correction
        if state.review_iteration < 3 {
            state.transition_to(MachineState::Correcting);
        } else {
            return Err(Autom8Error::MaxReviewIterationsReached);
        }
    }
    ReviewResult::Error(info) => {
        // Claude failed, decide what to do
        log::error!("Review failed: {}", info.message);
        state.transition_to(MachineState::Failed);
    }
}

The benefit of this approach compounds over time: adding a new error type or state variant causes compile errors everywhere that needs updating.


Self-Correcting Review Loops

LLMs make mistakes. Sometimes the code doesn't compile. Sometimes tests fail. Sometimes the implementation misses an acceptance criterion.

We use a simple bounded review loop. After all stories are marked complete, a reviewer agent checks the work against the spec. If issues are found, a corrector agent attempts fixes. This cycles up to n times before giving up gracefully.

The flow looks like this:

source
PickingStory (all complete)
        │
        ▼
    Reviewing ◀───────────┐
        │                 │
   ┌────┴────┐            │
   │         │            │
Pass    IssuesFound       │
   │         │            │
   ▼         ▼            │
Committing  Correcting ───┘
                (if iteration < n)

The reviewer agent gets the full spec context and writes issues to a file if any are found:

rust
pub fn run_reviewer<F>(
    spec: &Spec,
    iteration: u32,
    max_iterations: u32,
    mut on_output: F,
) -> Result<ReviewResult>
where
    F: FnMut(&str),
{
    let prompt = build_reviewer_prompt(spec, iteration, max_iterations);
    // ... run Claude ...

    // Check if review file exists and has content
    let review_path = Path::new(REVIEW);
    if review_path.exists() {
        match std::fs::read_to_string(review_path) {
            Ok(content) if !content.trim().is_empty() => Ok(ReviewResult::IssuesFound),
            Ok(_) => Ok(ReviewResult::Pass),
            Err(e) => Ok(ReviewResult::Error(/* ... */)),
        }
    } else {
        Ok(ReviewResult::Pass)
    }
}

If issues are found, the corrector agent reads that file and attempts fixes. The iteration count prevents infinite loops:

rust
// In the main loop
state.review_iteration += 1;

match run_reviewer(&spec, state.review_iteration, 3, on_output)? {
    ReviewResult::Pass => {
        state.transition_to(MachineState::Committing);
    }
    ReviewResult::IssuesFound => {
        if state.review_iteration >= 3 {
            return Err(Autom8Error::MaxReviewIterationsReached);
        }
        state.transition_to(MachineState::Correcting);
    }
    ReviewResult::Error(e) => { /* handle */ }
}

When the limit is hit, autom8 leaves the review file in place. The human can review the remaining issues, make manual fixes, and re-run if needed.


UI/UX Design

autom8 has two display modes: a streaming CLI for running tasks and an interactive TUI for monitoring. Both prioritize simplicity and consistency over showing everything.

CLI Output: Filtering the Noise

Claude produces lots of output: tool calls, file reads, thinking blocks, status messages... We filter this to only output what happens, and we always display the same nb of chars at the same pos for consistency.

rust
fn should_display(event: &StreamEvent) -> bool {
    match event {
        StreamEvent::Text(text) => !text.trim().is_empty(),
        StreamEvent::ToolUse { name, .. } => {
            // Show edits and writes, skip reads and searches
            matches!(name.as_str(), "Edit" | "Write" | "Bash")
        }
        StreamEvent::Thinking(_) => false,
        _ => false,
    }
}

We get a clean stream that shows: state transitions, which story is being worked on, edits being made, and commands being run. You can follow progress without scrolling through hundreds of lines of file contents and tool metadata.

Full Claude output is still available in log files if you need to debug.

Phase banners mark transitions between stages:

source
━━━━━ RUNNING CLAUDE ━━━━━
Working on: US-001 Add login endpoint

━━━━━ REVIEWING ━━━━━
Iteration 1/3

━━━━━ COMMITTING ━━━━━
feat(auth): implement login endpoint

These banners adapt to terminal width and provide consistent visual anchors as the run progresses.

Autom8 CLI output showing filtered, clean progress display

CLI output: filtered to show only relevant state transitions and actions

Monitor TUI: Real-Time Dashboard

The monitor TUI uses ratatui for an interactive dashboard. It polls state files from disk and renders a live view of all running projects.

The main view is a 2x2 quadrant grid showing active runs:

Navigation is arrows or vim-style: hjkl to move between quadrants, n/p to paginate when there are more than four runs, and Tab to switch views. Three views are available: active runs, project list, and run history.

Autom8 monitor TUI showing real-time dashboard with quadrant grid

Monitor TUI: real-time dashboard with state and progress at a glance


Config

Config lives at ~/.config/autom8/config.toml, project specific config: ~/.config/autom8/"project_xyz"/config.toml and is automatically created when running autom8 for the first time or in another directory where we want to do work. Project specific configs take precedence over global config.

The config struct captures what features are enabled:

rust
pub struct Config {
    pub review: bool,      // Run reviewer/corrector loop?
    pub commit: bool,      // Create git commits?
    pub pull_request: bool, // Create GitHub PR?
}

Config gets snapshotted at run start and stored in the state. This ensures resumed runs use the same settings they started with, even if the config file changes (this might change in a near future but has proven useful at the moment):

rust
pub struct RunState {
    // ...
    /// Configuration snapshot taken at run start
    #[serde(default)]
    pub config: Option<Config>,
    // ...
}

/// Get the effective config for this run
pub fn effective_config(&self) -> Config {
    self.config.clone().unwrap_or_default()
}

Conclusion

Our goal with autom8 is to have is a simple, fast, somewhat sophisticated and robust tool that facilitate using Claude for complex task by doing sane context window optimization, knowledge sharing between agents, features baked in (commits, pr's, reviews) without user hassle and without noise.

It's still very much WIP atm but definitly works much better than vanilla claude for complex/long tasks where a single claude instance's context window would grow quite large.

github.com/louisboilard/autom8

Uploaded 31-01-2026