Stuart Axelbrooke d8af3c8174 update docs describing architectural pattern (orchestrated state machines)

2025-11-21 17:08:51 +08:00

5.5 KiB

Raw Blame History

Orchestrated State Machines: A Theory of Application Architecture

Overview

DataBuild's core architecture exemplifies a pattern we call Dependency-Aware State Machine Orchestration or Stateful Dataflow Architecture. This document crystallizes the theory behind this approach and its applications.

The Pattern

At its essence, the pattern is:

Application = State Machines + Dependency Graph + Orchestration Logic

Where:

State Machines: Individual entities (Want, JobRun, Partition) with well-defined, type-safe states
Dependency Graph: Relationships between entities (wants depend on partitions, partitions depend on job runs)
Orchestration Logic: Coordination rules that trigger state transitions when dependencies are satisfied

Key Components

1. State Machines

Each domain entity is modeled as an explicit state machine with:

Well-defined states (NotStarted, Running, Completed, Failed, etc.)
Type-safe transitions enforced at compile time
Immutable state progression via consuming methods

Example from DataBuild:

pub enum JobRun<B: JobRunBackend> {
    NotStarted(JobRunWithState<B, B::NotStartedState>),
    Running(JobRunWithState<B, B::RunningState>),
    Completed(JobRunWithState<B, B::CompletedState>),
    Failed(JobRunWithState<B, B::FailedState>),
    // ...
}

// Can ONLY call run() on NotStarted jobs - compiler enforces this!
impl<B: JobRunBackend> JobRunWithState<B, B::NotStartedState> {
    pub fn run(self, env) -> Result<JobRunWithState<B, B::RunningState>, Error>
}

2. Dependency Graph

Entities are connected through explicit dependencies:

Wants → Partitions (wants request specific partitions)
Partitions → JobRuns (jobs build partitions)
JobRuns → Partitions (jobs declare what they built)

3. Orchestration Logic

A central orchestrator:

Observes all entity states
Evaluates dependency conditions
Triggers state transitions when conditions are met
Maintains global consistency invariants

Core Principles

Model domain entities as explicit state machines - Don't hide state in boolean flags
Express dependencies as a graph - Make relationships first-class
Centralize coordination logic - Separate entity behavior from system coordination
Make state transitions event-sourced - Append-only log enables time-travel and auditability
Use types to enforce valid transitions - Catch errors at compile time, not runtime

Advantages

Type Safety

Compile-time guarantees prevent invalid state transitions:

// This will not compile:
let job = JobRun::Running(running_job);
job.run(); // ERROR: no method `run` found for `JobRun<Running>`

Observability

Event-sourced state transitions provide complete audit trail:

What's running? Query running jobs
What failed? Filter by failed state
When did it transition? Check BEL timestamps

Testability

State machines can be tested in isolation
Orchestration logic can be tested with mock state machines
Dependency resolution can be tested independently

Incremental Progress

System can be stopped and restarted:

State is persisted in BEL
Resume from last known state
No need to restart from beginning

Correctness

Type system prevents impossible states
Event log provides ground truth
Dependency graph ensures proper ordering

Real-World Applications

This pattern is the fundamental architecture of:

Build Systems

Bazel, Buck, Pants - artifacts depend on other artifacts
Your "builds" are literally builds

Workflow Engines

Temporal, Prefect, Airflow - DAG of tasks with state
Each task is a state machine, orchestrator schedules based on dependencies

Data Orchestration

Dagster, Kedro - data assets with lineage
Partitions are data assets, jobs are transformations

Game Engines

Entity Component Systems - entities have state
Game loop orchestrates entity state transitions

Business Process Management

BPMN engines - business processes as state machines
Workflow engine coordinates process instances

When to Use This Pattern

This architecture is particularly powerful for systems where:

Eventual consistency is acceptable (not strict ACID transactions)
Incremental progress is important (can checkpoint and resume)
Observability is critical (need to know what's happening)
Correctness matters (type-safe transitions prevent bugs)
Concurrency is inherent (multiple things happening simultaneously)
Dependencies are complex (can't just process sequentially)

Implementation Lessons

Use Drain for State Transitions

Clean pattern for moving entities through states:

fn schedule_queued_jobs(&mut self) -> Result<()> {
    let mut new_jobs = Vec::new();
    for job in self.job_runs.drain(..) {
        let transitioned = match job {
            JobRun::NotStarted(ns) => JobRun::Running(ns.run(None)?),
            other => other, // Pass through unchanged
        };
        new_jobs.push(transitioned);
    }
    self.job_runs = new_jobs;
    Ok(())
}

Parameterize State for Type Safety

pub struct JobRunWithState<Backend, State> {
    job_run_id: Uuid,
    state: State,  // Type parameter enforces valid operations
}

Event Sourcing for Auditability

All state changes emit events to append-only log:

self.bel.append_event(&JobRunSuccessEvent {
    job_run_id,
    timestamp
})?;

Separate Entity Logic from Coordination

Entity state machines: "What transitions are valid for me?"
Orchestrator: "Given all entity states, what should happen next?"

5.5 KiB Raw Blame History