diff --git a/.gitignore b/.gitignore index 7a487fd..0af2eb6 100644 --- a/.gitignore +++ b/.gitignore @@ -19,3 +19,4 @@ logs/databuild/ # DSL generated code **/generated/ +!/databuild/databuild.rs diff --git a/AGENTS.md b/AGENTS.md index d8a2c96..5cb7ee8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -16,6 +16,21 @@ DataBuild is a bazel-based data build system. Key files: Please reference these for any related work, as they indicate key technical bias/direction of the project. +## Architecture Pattern + +DataBuild implements **Orchestrated State Machines** - a pattern where the application core is composed of: +- **Type-safe state machines** for domain entities (Want, JobRun, Partition) +- **Dependency graphs** expressing relationships between entities +- **Orchestration logic** that coordinates state transitions based on dependencies + +This architecture provides compile-time correctness, observability through event sourcing, and clean separation between entity behavior and coordination logic. See [`docs/orchestrated-state-machines.md`](docs/orchestrated-state-machines.md) for the full theory and implementation patterns. + +**Key implications for development:** +- Model entities as explicit state machines with type-parameterized states +- Use consuming methods for state transitions (enforces immutability) +- Emit events to BEL for all state changes (observability) +- Centralize coordination logic in the Orchestrator (separation of concerns) + ## Tenets - Declarative over imperative wherever possible/reasonable. diff --git a/docs/design/orchestrated-state-machines.md b/docs/design/orchestrated-state-machines.md new file mode 100644 index 0000000..f5f6a5b --- /dev/null +++ b/docs/design/orchestrated-state-machines.md @@ -0,0 +1,173 @@ +# Orchestrated State Machines: A Theory of Application Architecture + +## Overview + +DataBuild's core architecture exemplifies a pattern we call **Dependency-Aware State Machine Orchestration** or **Stateful Dataflow Architecture**. This document crystallizes the theory behind this approach and its applications. + +## The Pattern + +At its essence, the pattern is: + +``` +Application = State Machines + Dependency Graph + Orchestration Logic +``` + +Where: +- **State Machines**: Individual entities (Want, JobRun, Partition) with well-defined, type-safe states +- **Dependency Graph**: Relationships between entities (wants depend on partitions, partitions depend on job runs) +- **Orchestration Logic**: Coordination rules that trigger state transitions when dependencies are satisfied + +## Key Components + +### 1. State Machines + +Each domain entity is modeled as an explicit state machine with: +- **Well-defined states** (NotStarted, Running, Completed, Failed, etc.) +- **Type-safe transitions** enforced at compile time +- **Immutable state progression** via consuming methods + +Example from DataBuild: +```rust +pub enum JobRun { + NotStarted(JobRunWithState), + Running(JobRunWithState), + Completed(JobRunWithState), + Failed(JobRunWithState), + // ... +} + +// Can ONLY call run() on NotStarted jobs - compiler enforces this! +impl JobRunWithState { + pub fn run(self, env) -> Result, Error> +} +``` + +### 2. Dependency Graph + +Entities are connected through explicit dependencies: +- Wants → Partitions (wants request specific partitions) +- Partitions → JobRuns (jobs build partitions) +- JobRuns → Partitions (jobs declare what they built) + +### 3. Orchestration Logic + +A central orchestrator: +- Observes all entity states +- Evaluates dependency conditions +- Triggers state transitions when conditions are met +- Maintains global consistency invariants + +## Core Principles + +1. **Model domain entities as explicit state machines** - Don't hide state in boolean flags +2. **Express dependencies as a graph** - Make relationships first-class +3. **Centralize coordination logic** - Separate entity behavior from system coordination +4. **Make state transitions event-sourced** - Append-only log enables time-travel and auditability +5. **Use types to enforce valid transitions** - Catch errors at compile time, not runtime + +## Advantages + +### Type Safety +Compile-time guarantees prevent invalid state transitions: +```rust +// This will not compile: +let job = JobRun::Running(running_job); +job.run(); // ERROR: no method `run` found for `JobRun` +``` + +### Observability +Event-sourced state transitions provide complete audit trail: +- What's running? Query running jobs +- What failed? Filter by failed state +- When did it transition? Check BEL timestamps + +### Testability +- State machines can be tested in isolation +- Orchestration logic can be tested with mock state machines +- Dependency resolution can be tested independently + +### Incremental Progress +System can be stopped and restarted: +- State is persisted in BEL +- Resume from last known state +- No need to restart from beginning + +### Correctness +- Type system prevents impossible states +- Event log provides ground truth +- Dependency graph ensures proper ordering + +## Real-World Applications + +This pattern is the **fundamental architecture** of: + +**Build Systems** +- Bazel, Buck, Pants - artifacts depend on other artifacts +- Your "builds" are literally builds + +**Workflow Engines** +- Temporal, Prefect, Airflow - DAG of tasks with state +- Each task is a state machine, orchestrator schedules based on dependencies + +**Data Orchestration** +- Dagster, Kedro - data assets with lineage +- Partitions are data assets, jobs are transformations + +**Game Engines** +- Entity Component Systems - entities have state +- Game loop orchestrates entity state transitions + +**Business Process Management** +- BPMN engines - business processes as state machines +- Workflow engine coordinates process instances + +## When to Use This Pattern + +This architecture is particularly powerful for systems where: + +- **Eventual consistency** is acceptable (not strict ACID transactions) +- **Incremental progress** is important (can checkpoint and resume) +- **Observability** is critical (need to know what's happening) +- **Correctness** matters (type-safe transitions prevent bugs) +- **Concurrency** is inherent (multiple things happening simultaneously) +- **Dependencies** are complex (can't just process sequentially) + +## Implementation Lessons + +### Use Drain for State Transitions +Clean pattern for moving entities through states: +```rust +fn schedule_queued_jobs(&mut self) -> Result<()> { + let mut new_jobs = Vec::new(); + for job in self.job_runs.drain(..) { + let transitioned = match job { + JobRun::NotStarted(ns) => JobRun::Running(ns.run(None)?), + other => other, // Pass through unchanged + }; + new_jobs.push(transitioned); + } + self.job_runs = new_jobs; + Ok(()) +} +``` + +### Parameterize State for Type Safety +```rust +pub struct JobRunWithState { + job_run_id: Uuid, + state: State, // Type parameter enforces valid operations +} +``` + +### Event Sourcing for Auditability +All state changes emit events to append-only log: +```rust +self.bel.append_event(&JobRunSuccessEvent { + job_run_id, + timestamp +})?; +``` + +### Separate Entity Logic from Coordination +- Entity state machines: "What transitions are valid for me?" +- Orchestrator: "Given all entity states, what should happen next?"