update docs describing architectural pattern (orchestrated state machines)
This commit is contained in:
parent
55f51125c3
commit
d8af3c8174
3 changed files with 189 additions and 0 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -19,3 +19,4 @@ logs/databuild/
|
||||||
|
|
||||||
# DSL generated code
|
# DSL generated code
|
||||||
**/generated/
|
**/generated/
|
||||||
|
!/databuild/databuild.rs
|
||||||
|
|
|
||||||
15
AGENTS.md
15
AGENTS.md
|
|
@ -16,6 +16,21 @@ DataBuild is a bazel-based data build system. Key files:
|
||||||
|
|
||||||
Please reference these for any related work, as they indicate key technical bias/direction of the project.
|
Please reference these for any related work, as they indicate key technical bias/direction of the project.
|
||||||
|
|
||||||
|
## Architecture Pattern
|
||||||
|
|
||||||
|
DataBuild implements **Orchestrated State Machines** - a pattern where the application core is composed of:
|
||||||
|
- **Type-safe state machines** for domain entities (Want, JobRun, Partition)
|
||||||
|
- **Dependency graphs** expressing relationships between entities
|
||||||
|
- **Orchestration logic** that coordinates state transitions based on dependencies
|
||||||
|
|
||||||
|
This architecture provides compile-time correctness, observability through event sourcing, and clean separation between entity behavior and coordination logic. See [`docs/orchestrated-state-machines.md`](docs/orchestrated-state-machines.md) for the full theory and implementation patterns.
|
||||||
|
|
||||||
|
**Key implications for development:**
|
||||||
|
- Model entities as explicit state machines with type-parameterized states
|
||||||
|
- Use consuming methods for state transitions (enforces immutability)
|
||||||
|
- Emit events to BEL for all state changes (observability)
|
||||||
|
- Centralize coordination logic in the Orchestrator (separation of concerns)
|
||||||
|
|
||||||
## Tenets
|
## Tenets
|
||||||
|
|
||||||
- Declarative over imperative wherever possible/reasonable.
|
- Declarative over imperative wherever possible/reasonable.
|
||||||
|
|
|
||||||
173
docs/design/orchestrated-state-machines.md
Normal file
173
docs/design/orchestrated-state-machines.md
Normal file
|
|
@ -0,0 +1,173 @@
|
||||||
|
# Orchestrated State Machines: A Theory of Application Architecture
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
DataBuild's core architecture exemplifies a pattern we call **Dependency-Aware State Machine Orchestration** or **Stateful Dataflow Architecture**. This document crystallizes the theory behind this approach and its applications.
|
||||||
|
|
||||||
|
## The Pattern
|
||||||
|
|
||||||
|
At its essence, the pattern is:
|
||||||
|
|
||||||
|
```
|
||||||
|
Application = State Machines + Dependency Graph + Orchestration Logic
|
||||||
|
```
|
||||||
|
|
||||||
|
Where:
|
||||||
|
- **State Machines**: Individual entities (Want, JobRun, Partition) with well-defined, type-safe states
|
||||||
|
- **Dependency Graph**: Relationships between entities (wants depend on partitions, partitions depend on job runs)
|
||||||
|
- **Orchestration Logic**: Coordination rules that trigger state transitions when dependencies are satisfied
|
||||||
|
|
||||||
|
## Key Components
|
||||||
|
|
||||||
|
### 1. State Machines
|
||||||
|
|
||||||
|
Each domain entity is modeled as an explicit state machine with:
|
||||||
|
- **Well-defined states** (NotStarted, Running, Completed, Failed, etc.)
|
||||||
|
- **Type-safe transitions** enforced at compile time
|
||||||
|
- **Immutable state progression** via consuming methods
|
||||||
|
|
||||||
|
Example from DataBuild:
|
||||||
|
```rust
|
||||||
|
pub enum JobRun<B: JobRunBackend> {
|
||||||
|
NotStarted(JobRunWithState<B, B::NotStartedState>),
|
||||||
|
Running(JobRunWithState<B, B::RunningState>),
|
||||||
|
Completed(JobRunWithState<B, B::CompletedState>),
|
||||||
|
Failed(JobRunWithState<B, B::FailedState>),
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
|
||||||
|
// Can ONLY call run() on NotStarted jobs - compiler enforces this!
|
||||||
|
impl<B: JobRunBackend> JobRunWithState<B, B::NotStartedState> {
|
||||||
|
pub fn run(self, env) -> Result<JobRunWithState<B, B::RunningState>, Error>
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Dependency Graph
|
||||||
|
|
||||||
|
Entities are connected through explicit dependencies:
|
||||||
|
- Wants → Partitions (wants request specific partitions)
|
||||||
|
- Partitions → JobRuns (jobs build partitions)
|
||||||
|
- JobRuns → Partitions (jobs declare what they built)
|
||||||
|
|
||||||
|
### 3. Orchestration Logic
|
||||||
|
|
||||||
|
A central orchestrator:
|
||||||
|
- Observes all entity states
|
||||||
|
- Evaluates dependency conditions
|
||||||
|
- Triggers state transitions when conditions are met
|
||||||
|
- Maintains global consistency invariants
|
||||||
|
|
||||||
|
## Core Principles
|
||||||
|
|
||||||
|
1. **Model domain entities as explicit state machines** - Don't hide state in boolean flags
|
||||||
|
2. **Express dependencies as a graph** - Make relationships first-class
|
||||||
|
3. **Centralize coordination logic** - Separate entity behavior from system coordination
|
||||||
|
4. **Make state transitions event-sourced** - Append-only log enables time-travel and auditability
|
||||||
|
5. **Use types to enforce valid transitions** - Catch errors at compile time, not runtime
|
||||||
|
|
||||||
|
## Advantages
|
||||||
|
|
||||||
|
### Type Safety
|
||||||
|
Compile-time guarantees prevent invalid state transitions:
|
||||||
|
```rust
|
||||||
|
// This will not compile:
|
||||||
|
let job = JobRun::Running(running_job);
|
||||||
|
job.run(); // ERROR: no method `run` found for `JobRun<Running>`
|
||||||
|
```
|
||||||
|
|
||||||
|
### Observability
|
||||||
|
Event-sourced state transitions provide complete audit trail:
|
||||||
|
- What's running? Query running jobs
|
||||||
|
- What failed? Filter by failed state
|
||||||
|
- When did it transition? Check BEL timestamps
|
||||||
|
|
||||||
|
### Testability
|
||||||
|
- State machines can be tested in isolation
|
||||||
|
- Orchestration logic can be tested with mock state machines
|
||||||
|
- Dependency resolution can be tested independently
|
||||||
|
|
||||||
|
### Incremental Progress
|
||||||
|
System can be stopped and restarted:
|
||||||
|
- State is persisted in BEL
|
||||||
|
- Resume from last known state
|
||||||
|
- No need to restart from beginning
|
||||||
|
|
||||||
|
### Correctness
|
||||||
|
- Type system prevents impossible states
|
||||||
|
- Event log provides ground truth
|
||||||
|
- Dependency graph ensures proper ordering
|
||||||
|
|
||||||
|
## Real-World Applications
|
||||||
|
|
||||||
|
This pattern is the **fundamental architecture** of:
|
||||||
|
|
||||||
|
**Build Systems**
|
||||||
|
- Bazel, Buck, Pants - artifacts depend on other artifacts
|
||||||
|
- Your "builds" are literally builds
|
||||||
|
|
||||||
|
**Workflow Engines**
|
||||||
|
- Temporal, Prefect, Airflow - DAG of tasks with state
|
||||||
|
- Each task is a state machine, orchestrator schedules based on dependencies
|
||||||
|
|
||||||
|
**Data Orchestration**
|
||||||
|
- Dagster, Kedro - data assets with lineage
|
||||||
|
- Partitions are data assets, jobs are transformations
|
||||||
|
|
||||||
|
**Game Engines**
|
||||||
|
- Entity Component Systems - entities have state
|
||||||
|
- Game loop orchestrates entity state transitions
|
||||||
|
|
||||||
|
**Business Process Management**
|
||||||
|
- BPMN engines - business processes as state machines
|
||||||
|
- Workflow engine coordinates process instances
|
||||||
|
|
||||||
|
## When to Use This Pattern
|
||||||
|
|
||||||
|
This architecture is particularly powerful for systems where:
|
||||||
|
|
||||||
|
- **Eventual consistency** is acceptable (not strict ACID transactions)
|
||||||
|
- **Incremental progress** is important (can checkpoint and resume)
|
||||||
|
- **Observability** is critical (need to know what's happening)
|
||||||
|
- **Correctness** matters (type-safe transitions prevent bugs)
|
||||||
|
- **Concurrency** is inherent (multiple things happening simultaneously)
|
||||||
|
- **Dependencies** are complex (can't just process sequentially)
|
||||||
|
|
||||||
|
## Implementation Lessons
|
||||||
|
|
||||||
|
### Use Drain for State Transitions
|
||||||
|
Clean pattern for moving entities through states:
|
||||||
|
```rust
|
||||||
|
fn schedule_queued_jobs(&mut self) -> Result<()> {
|
||||||
|
let mut new_jobs = Vec::new();
|
||||||
|
for job in self.job_runs.drain(..) {
|
||||||
|
let transitioned = match job {
|
||||||
|
JobRun::NotStarted(ns) => JobRun::Running(ns.run(None)?),
|
||||||
|
other => other, // Pass through unchanged
|
||||||
|
};
|
||||||
|
new_jobs.push(transitioned);
|
||||||
|
}
|
||||||
|
self.job_runs = new_jobs;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameterize State for Type Safety
|
||||||
|
```rust
|
||||||
|
pub struct JobRunWithState<Backend, State> {
|
||||||
|
job_run_id: Uuid,
|
||||||
|
state: State, // Type parameter enforces valid operations
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Event Sourcing for Auditability
|
||||||
|
All state changes emit events to append-only log:
|
||||||
|
```rust
|
||||||
|
self.bel.append_event(&JobRunSuccessEvent {
|
||||||
|
job_run_id,
|
||||||
|
timestamp
|
||||||
|
})?;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Separate Entity Logic from Coordination
|
||||||
|
- Entity state machines: "What transitions are valid for me?"
|
||||||
|
- Orchestrator: "Given all entity states, what should happen next?"
|
||||||
Loading…
Reference in a new issue