# Orchestrated State Machines: A Theory of Application Architecture ## Overview DataBuild's core architecture exemplifies a pattern we call **Dependency-Aware State Machine Orchestration** or **Stateful Dataflow Architecture**. This document crystallizes the theory behind this approach and its applications. ## The Pattern At its essence, the pattern is: ``` Application = State Machines + Dependency Graph + Orchestration Logic ``` Where: - **State Machines**: Individual entities (Want, JobRun, Partition) with well-defined, type-safe states - **Dependency Graph**: Relationships between entities (wants depend on partitions, partitions depend on job runs) - **Orchestration Logic**: Coordination rules that trigger state transitions when dependencies are satisfied ## Key Components ### 1. State Machines Each domain entity is modeled as an explicit state machine with: - **Well-defined states** (NotStarted, Running, Completed, Failed, etc.) - **Type-safe transitions** enforced at compile time - **Immutable state progression** via consuming methods Example from DataBuild: ```rust pub enum JobRun { NotStarted(JobRunWithState), Running(JobRunWithState), Completed(JobRunWithState), Failed(JobRunWithState), // ... } // Can ONLY call run() on NotStarted jobs - compiler enforces this! impl JobRunWithState { pub fn run(self, env) -> Result, Error> } ``` ### 2. Dependency Graph Entities are connected through explicit dependencies: - Wants → Partitions (wants request specific partitions) - Partitions → JobRuns (jobs build partitions) - JobRuns → Partitions (jobs declare what they built) ### 3. Orchestration Logic A central orchestrator: - Observes all entity states - Evaluates dependency conditions - Triggers state transitions when conditions are met - Maintains global consistency invariants ## Core Principles 1. **Model domain entities as explicit state machines** - Don't hide state in boolean flags 2. **Express dependencies as a graph** - Make relationships first-class 3. **Centralize coordination logic** - Separate entity behavior from system coordination 4. **Make state transitions event-sourced** - Append-only log enables time-travel and auditability 5. **Use types to enforce valid transitions** - Catch errors at compile time, not runtime ## Advantages ### Type Safety Compile-time guarantees prevent invalid state transitions: ```rust // This will not compile: let job = JobRun::Running(running_job); job.run(); // ERROR: no method `run` found for `JobRun` ``` ### Observability Event-sourced state transitions provide complete audit trail: - What's running? Query running jobs - What failed? Filter by failed state - When did it transition? Check BEL timestamps ### Testability - State machines can be tested in isolation - Orchestration logic can be tested with mock state machines - Dependency resolution can be tested independently ### Incremental Progress System can be stopped and restarted: - State is persisted in BEL - Resume from last known state - No need to restart from beginning ### Correctness - Type system prevents impossible states - Event log provides ground truth - Dependency graph ensures proper ordering ## Real-World Applications This pattern is the **fundamental architecture** of: **Build Systems** - Bazel, Buck, Pants - artifacts depend on other artifacts - Your "builds" are literally builds **Workflow Engines** - Temporal, Prefect, Airflow - DAG of tasks with state - Each task is a state machine, orchestrator schedules based on dependencies **Data Orchestration** - Dagster, Kedro - data assets with lineage - Partitions are data assets, jobs are transformations **Game Engines** - Entity Component Systems - entities have state - Game loop orchestrates entity state transitions **Business Process Management** - BPMN engines - business processes as state machines - Workflow engine coordinates process instances ## When to Use This Pattern This architecture is particularly powerful for systems where: - **Eventual consistency** is acceptable (not strict ACID transactions) - **Incremental progress** is important (can checkpoint and resume) - **Observability** is critical (need to know what's happening) - **Correctness** matters (type-safe transitions prevent bugs) - **Concurrency** is inherent (multiple things happening simultaneously) - **Dependencies** are complex (can't just process sequentially) ## Implementation Lessons ### Use Drain for State Transitions Clean pattern for moving entities through states: ```rust fn schedule_queued_jobs(&mut self) -> Result<()> { let mut new_jobs = Vec::new(); for job in self.job_runs.drain(..) { let transitioned = match job { JobRun::NotStarted(ns) => JobRun::Running(ns.run(None)?), other => other, // Pass through unchanged }; new_jobs.push(transitioned); } self.job_runs = new_jobs; Ok(()) } ``` ### Parameterize State for Type Safety ```rust pub struct JobRunWithState { job_run_id: Uuid, state: State, // Type parameter enforces valid operations } ``` ### Event Sourcing for Auditability All state changes emit events to append-only log: ```rust self.bel.append_event(&JobRunSuccessEvent { job_run_id, timestamp })?; ``` ### Separate Entity Logic from Coordination - Entity state machines: "What transitions are valid for me?" - Orchestrator: "Given all entity states, what should happen next?"