5.5 KiB
Orchestrated State Machines: A Theory of Application Architecture
Overview
DataBuild's core architecture exemplifies a pattern we call Dependency-Aware State Machine Orchestration or Stateful Dataflow Architecture. This document crystallizes the theory behind this approach and its applications.
The Pattern
At its essence, the pattern is:
Application = State Machines + Dependency Graph + Orchestration Logic
Where:
- State Machines: Individual entities (Want, JobRun, Partition) with well-defined, type-safe states
- Dependency Graph: Relationships between entities (wants depend on partitions, partitions depend on job runs)
- Orchestration Logic: Coordination rules that trigger state transitions when dependencies are satisfied
Key Components
1. State Machines
Each domain entity is modeled as an explicit state machine with:
- Well-defined states (NotStarted, Running, Completed, Failed, etc.)
- Type-safe transitions enforced at compile time
- Immutable state progression via consuming methods
Example from DataBuild:
pub enum JobRun<B: JobRunBackend> {
NotStarted(JobRunWithState<B, B::NotStartedState>),
Running(JobRunWithState<B, B::RunningState>),
Completed(JobRunWithState<B, B::CompletedState>),
Failed(JobRunWithState<B, B::FailedState>),
// ...
}
// Can ONLY call run() on NotStarted jobs - compiler enforces this!
impl<B: JobRunBackend> JobRunWithState<B, B::NotStartedState> {
pub fn run(self, env) -> Result<JobRunWithState<B, B::RunningState>, Error>
}
2. Dependency Graph
Entities are connected through explicit dependencies:
- Wants → Partitions (wants request specific partitions)
- Partitions → JobRuns (jobs build partitions)
- JobRuns → Partitions (jobs declare what they built)
3. Orchestration Logic
A central orchestrator:
- Observes all entity states
- Evaluates dependency conditions
- Triggers state transitions when conditions are met
- Maintains global consistency invariants
Core Principles
- Model domain entities as explicit state machines - Don't hide state in boolean flags
- Express dependencies as a graph - Make relationships first-class
- Centralize coordination logic - Separate entity behavior from system coordination
- Make state transitions event-sourced - Append-only log enables time-travel and auditability
- Use types to enforce valid transitions - Catch errors at compile time, not runtime
Advantages
Type Safety
Compile-time guarantees prevent invalid state transitions:
// This will not compile:
let job = JobRun::Running(running_job);
job.run(); // ERROR: no method `run` found for `JobRun<Running>`
Observability
Event-sourced state transitions provide complete audit trail:
- What's running? Query running jobs
- What failed? Filter by failed state
- When did it transition? Check BEL timestamps
Testability
- State machines can be tested in isolation
- Orchestration logic can be tested with mock state machines
- Dependency resolution can be tested independently
Incremental Progress
System can be stopped and restarted:
- State is persisted in BEL
- Resume from last known state
- No need to restart from beginning
Correctness
- Type system prevents impossible states
- Event log provides ground truth
- Dependency graph ensures proper ordering
Real-World Applications
This pattern is the fundamental architecture of:
Build Systems
- Bazel, Buck, Pants - artifacts depend on other artifacts
- Your "builds" are literally builds
Workflow Engines
- Temporal, Prefect, Airflow - DAG of tasks with state
- Each task is a state machine, orchestrator schedules based on dependencies
Data Orchestration
- Dagster, Kedro - data assets with lineage
- Partitions are data assets, jobs are transformations
Game Engines
- Entity Component Systems - entities have state
- Game loop orchestrates entity state transitions
Business Process Management
- BPMN engines - business processes as state machines
- Workflow engine coordinates process instances
When to Use This Pattern
This architecture is particularly powerful for systems where:
- Eventual consistency is acceptable (not strict ACID transactions)
- Incremental progress is important (can checkpoint and resume)
- Observability is critical (need to know what's happening)
- Correctness matters (type-safe transitions prevent bugs)
- Concurrency is inherent (multiple things happening simultaneously)
- Dependencies are complex (can't just process sequentially)
Implementation Lessons
Use Drain for State Transitions
Clean pattern for moving entities through states:
fn schedule_queued_jobs(&mut self) -> Result<()> {
let mut new_jobs = Vec::new();
for job in self.job_runs.drain(..) {
let transitioned = match job {
JobRun::NotStarted(ns) => JobRun::Running(ns.run(None)?),
other => other, // Pass through unchanged
};
new_jobs.push(transitioned);
}
self.job_runs = new_jobs;
Ok(())
}
Parameterize State for Type Safety
pub struct JobRunWithState<Backend, State> {
job_run_id: Uuid,
state: State, // Type parameter enforces valid operations
}
Event Sourcing for Auditability
All state changes emit events to append-only log:
self.bel.append_event(&JobRunSuccessEvent {
job_run_id,
timestamp
})?;
Separate Entity Logic from Coordination
- Entity state machines: "What transitions are valid for me?"
- Orchestrator: "Given all entity states, what should happen next?"