update skill for build state semantics

This commit is contained in:
Stuart Axelbrooke 2025-11-25 11:15:04 +08:00
parent 5ac51934ea
commit 0d1cac6406

View file

@ -4,4 +4,577 @@ description: How the core semantics of databuild's build state works; what its c
version: 1.0.0 version: 1.0.0
--- ---
# Build State Semantics
To achieve databuild's goal of declarative partitioned data builds (via explicitly stated data dependencies between jobs), databuild employs a "build state" concept that, together with the orchestrator, makes up all of the data catalog and job run scheduling logic needed to produce data based on user wants.
## Core Mental Model
DataBuild's BuildState implements **event sourcing** combined with an **Entity Component System (ECS)** pattern:
- **Immutable event log**: All state changes recorded as events (WantCreateEventV1, JobRunBufferEventV1, etc.)
- **Derived mutable state**: BuildState reconstructed by replaying events through state machines
- **ECS pattern**: Entities stored in flat collections, relationships via inverted indexes (not nested objects)
- **Type-state machines**: Compile-time enforcement of valid state transitions
**Public interface**:
- Query: `get_want()`, `list_partitions()`, `get_partition()`, etc.
- Mutation: `handle_event()` processes events and transitions states
- No direct state manipulation outside event handling
**Separation of concerns**:
- **BuildState**: Maintains entity state, processes events, provides queries
- **Orchestrator**: Polls BuildState, makes scheduling decisions, emits job events
## Compile-Time Correctness Strategy
The primary defense against bugs is making invalid states **unrepresentable** at compile time.
**Type-state pattern**: States encoded in type system, transitions consume self
```rust
// Can only call .complete() on BuildingState
impl PartitionWithState<BuildingState> {
pub fn complete(self, job_run_id: String, timestamp: u64) -> PartitionWithState<LiveState> {
PartitionWithState {
partition_ref: self.partition_ref,
state: LiveState { built_at: timestamp, built_by: job_run_id },
}
}
}
// Cannot call .complete() on LiveState - method doesn't exist
impl PartitionWithState<LiveState> {
pub fn taint(self, taint_id: String, timestamp: u64) -> PartitionWithState<TaintedState> { ... }
}
```
**Benefits**:
- Invalid transitions caught at compile time: `live_partition.complete()` → compile error
- Refactoring safety: compiler guides you through state machine changes
- Self-documenting: `fn schedule(want: WantWithState<IdleState>)` encodes precondition
- Fast feedback loop: seconds (compile error) vs minutes (runtime panic) vs days (production bug)
**Runtime panics reserved for invariant violations** (bugs in BuildState implementation):
- Missing references: `partitions_by_uuid[uuid]` doesn't exist → panic with context
- Index inconsistencies: `canonical_partitions[ref]` points to invalid UUID → panic
- These should never happen in correct implementation
## Architectural Layering
Three entity types with pragmatic data flow:
```
Wants (user requests for data)
↓ references partition refs (Vec<PartitionRef>)
Partitions (data artifacts being built)
↓ building_by/built_by job_run_ids (tracking)
↑ wants_for_partition inverted index
JobRuns (execution processes)
```
**Direct references**:
- Wants → Partitions: wants store `partitions: Vec<PartitionRef>`
- JobRuns → Partitions: jobs store `building_partition_uuids: Vec<Uuid>`
- Partitions → JobRuns: partitions store `building_by: Vec<String>` (job_run_ids)
**Inverted index**:
- Partitions → Wants: `wants_for_partition: BTreeMap<String, Vec<String>>`
- Maps partition_ref → want_ids waiting for it
- Why not direct? Partitions keyed by UUID, but wants use partition_ref for mapping
- Efficient lookup: "which wants are waiting for partition ref X?"
**Intentional separation**:
- JobRuns don't know about Wants (jobs build partitions, agnostic to requesters)
- Wants don't know about JobRuns (users care about data availability, not execution)
## Entity State Machines
### Want States
```
New → {Idle, Building, UpstreamBuilding, Successful, Failed, UpstreamFailed, Canceled}
```
**State semantics**: "What is the current status of my requested partitions?"
- **New**: Just created, state not yet determined (ephemeral, transitions immediately)
- **Idle**: Partitions don't exist or are ready to retry (UpForRetry) → schedulable
- **Building**: Canonical partitions currently being built by jobs
- **UpstreamBuilding**: Canonical partitions waiting for upstream dependencies
- **Successful**: All canonical partitions are Live
- **Failed**: Canonical partition hard failure (shouldn't retry)
- **UpstreamFailed**: Canonical partition's upstream failed (can't succeed)
- **Canceled**: Explicitly canceled by user/system
**Key insight**: Want state reflects canonical partition state, not bound to specific partition UUIDs.
Example:
```rust
// Want created for "data/beta"
want.partitions = ["data/beta"]
// Determine state by checking canonical partition
if let Some(uuid) = canonical_partitions.get("data/beta") {
let partition = partitions_by_uuid[uuid];
match partition.state {
Building => want.state = Building,
Live => want.state = Successful,
Failed => want.state = Failed,
// ...
}
} else {
want.state = Idle // No canonical partition exists
}
```
### Partition States
```
Building → {UpstreamBuilding, UpForRetry, Live, Failed, UpstreamFailed, Tainted}
```
**State semantics**: "What is the current build status? Is this partition leasable?"
- **Building**: Job actively building, lease held (prevent concurrent builds)
- **UpstreamBuilding**: Dep miss occurred, waiting for upstreams, lease held
- **UpForRetry**: Upstreams satisfied, ready to retry, lease released
- **Live**: Successfully built (terminal)
- **Failed**: Hard failure, shouldn't retry (terminal, lease released)
- **UpstreamFailed**: Upstream deps failed, can't succeed (terminal, lease released)
- **Tainted**: Marked invalid by taint event (terminal)
**No Missing state**: Partitions only exist when jobs start building them or have completed.
**State as lease mechanism**:
- Building/UpstreamBuilding: Lease held → orchestrator will NOT schedule new jobs
- UpForRetry/Failed/UpstreamFailed: Lease released → safe to schedule (though Failed/UpstreamFailed block wants)
- Live/Tainted: Not lease states
Example lease behavior:
```
Partition uuid-1 ("data/beta"): Building
Want W1 arrives for "data/beta" → New → Building (sees canonical is Building)
Want W2 arrives for "data/beta" → New → Building (sees canonical is Building)
Orchestrator polls: both wants Building, canonical partition Building → NOT schedulable (lease held)
```
### JobRun States
```
Queued → Running → {Successful, Failed, DepMissed}
```
- **Queued**: Job buffered, not yet started
- **Running**: Process executing
- **Successful**: Completed successfully, partitions built
- **Failed**: Process failed
- **DepMissed**: Job discovered missing dependencies, created derivative wants
## Temporal Identity & References
**Problem**: How do we distinguish "the partition being built now" from "the partition built yesterday"?
**Solution**: Partition UUIDs for temporal identity, separate from user-facing refs.
### Partition UUIDs (Immutable Identity)
Each partition build attempt gets unique UUID:
```rust
fn derive_partition_uuid(job_run_id: &str, partition_ref: &str) -> Uuid {
let mut hasher = Sha256::new();
hasher.update(job_run_id.as_bytes());
hasher.update(partition_ref.as_bytes());
let hash = hasher.finalize();
Uuid::from_slice(&hash[0..16]).unwrap()
}
```
**Properties**:
- Deterministic: Same job + ref → same UUID (enables event replay)
- Immutable: Partition(uuid-1) represents specific historical build
- Jobs reference UUIDs: "Job J built Partition uuid-1 at time T"
### Partition Refs (Canonical Names)
User-facing identifier like `"data/category=tech/date=2024-01-15"`:
- Wants reference refs: "I want data/beta to be Live"
- Canonical partitions: `canonical_partitions["data/beta"] → uuid-3`
- One canonical UUID per ref at any time
### Dual Indexing
```rust
// All partition instances (historical + current)
partitions_by_uuid: BTreeMap<Uuid, Partition>
// Current/canonical partition for each ref
canonical_partitions: BTreeMap<String, Uuid>
```
**Lifecycle example**:
```
1. Job J1 starts → uuid-1 generated for "data/beta"
2. Partition(uuid-1, "data/beta", Building) created
3. canonical_partitions["data/beta"] = uuid-1
4. Job completes → Partition(uuid-1, Live)
5. Partition tainted → Partition(uuid-1, Tainted), still canonical
6. New job J2 starts → uuid-2 generated
7. Partition(uuid-2, "data/beta", Building) created
8. canonical_partitions["data/beta"] = uuid-2 (updated)
9. Partition(uuid-1) remains in partitions_by_uuid for history
```
**Query semantics**:
- "What's the current state of data/beta?" → lookup canonical_partitions["data/beta"], then partitions_by_uuid[uuid]
- "What partition did job J build?" → job.building_partition_uuids → partitions_by_uuid[uuid]
- "What was the state at time T?" → replay events up to T, query canonical_partitions
## BuildState Data Structure (ECS Pattern)
Flat collections, not nested objects:
```rust
pub struct BuildState {
// Entity collections
wants: BTreeMap<String, Want>,
partitions_by_uuid: BTreeMap<Uuid, Partition>,
canonical_partitions: BTreeMap<String, Uuid>,
job_runs: BTreeMap<String, JobRun>,
// Inverted indexes
wants_for_partition: BTreeMap<String, Vec<String>>, // partition_ref → want_ids
downstream_waiting: BTreeMap<String, Vec<Uuid>>, // partition_ref → waiting_partition_uuids
}
```
**Why ECS over OOP**:
- Avoids deep object hierarchies (`Want { partitions: Vec<Partition { job_runs: Vec<JobRun> }>}`)
- Flexible querying without coupling
- Inverted indexes provide O(1) reverse lookups
- State rebuilds from events without complex object reconstruction
- Access patterns drive data structure (not inheritance)
**Inverted index example**:
```rust
// Traditional OOP (tight coupling)
partition.wants.iter().for_each(|want| transition_want(want));
// ECS with inverted index (decoupled)
if let Some(want_ids) = wants_for_partition.get(&partition_ref) {
for want_id in want_ids {
let want = wants.get_mut(want_id).unwrap();
transition_want(want);
}
}
```
## Inverted Indexes
### wants_for_partition
```rust
BTreeMap<String, Vec<String>> // partition_ref → want_ids
```
**Purpose**: Find all wants waiting for a partition ref
**Maintenance**:
- Updated on want creation: add want_id to each partition_ref in want
- NOT cleaned up on want completion (acceptable, bounded growth)
- Replaces `partition.wants: Vec<String>` that would exist in OOP
**Usage**:
```rust
// When partition transitions Building → Live
let partition_ref = &partition.partition_ref.r#ref;
if let Some(want_ids) = wants_for_partition.get(partition_ref) {
for want_id in want_ids {
// Check if all partitions for this want are Live
// If yes, transition want Idle/Building → Successful
}
}
```
### downstream_waiting
```rust
BTreeMap<String, Vec<Uuid>> // partition_ref → waiting_partition_uuids
```
**Purpose**: O(1) lookup of partitions waiting for an upstream when it completes/fails
**Maintenance**:
- Updated when partition transitions Building → UpstreamBuilding
- For each missing upstream ref, add partition UUID to `downstream_waiting[upstream_ref]`
- Cleaned up when partition transitions UpstreamBuilding → UpForRetry/UpstreamFailed
- Remove partition UUID from all `downstream_waiting` entries
**Usage**:
```rust
// When upstream partition "data/alpha" becomes Live
if let Some(waiting_uuids) = downstream_waiting.get("data/alpha") {
for uuid in waiting_uuids {
let partition = partitions_by_uuid.get_mut(uuid).unwrap();
// Check if ALL this partition's MissingDeps are now satisfied
if all_deps_satisfied(partition) {
partition = partition.transition_to_up_for_retry();
}
}
}
```
**Why needed**: Avoids scanning all UpstreamBuilding partitions when upstreams complete.
## BuildState Responsibilities
What BuildState does:
- Maintain entity state machines (process events, transition states)
- Provide query interfaces (`get_want`, `list_partitions`, etc.)
- Maintain inverted indexes for efficient lookups
- Enforce invariants (panic on reference errors with context)
- Rebuild state from event log (replay)
What BuildState does NOT do:
- Make scheduling decisions (that's Orchestrator)
- Execute jobs (that's external processes)
- Generate UUIDs (done deterministically during event handling from job_run_id)
**Key insight**: BuildState is a pure state container. All coordination logic lives in Orchestrator.
## Want State Determination (Sensing)
When a want is created, it observes canonical partition states and transitions accordingly.
**Priority order** (first match wins):
1. If ANY canonical partition is Failed → New → Failed
2. If ANY canonical partition is UpstreamFailed → New → UpstreamFailed
3. If ALL canonical partitions exist AND are Live → New → Successful
4. If ANY canonical partition is Building → New → Building
5. If ANY canonical partition is UpstreamBuilding → New → UpstreamBuilding
6. If ANY canonical partition is UpForRetry → New → Idle (deps satisfied, ready to schedule)
7. Otherwise (partitions don't exist) → New → Idle
**Example**:
```rust
// Want W1 created for ["data/alpha", "data/beta"]
// canonical_partitions["data/alpha"] = uuid-1 (Building)
// canonical_partitions["data/beta"] = uuid-2 (Live)
// Result: W1 goes New → Building (rule 4: ANY partition Building)
// Want W2 created for ["data/gamma"]
// canonical_partitions["data/gamma"] doesn't exist
// Result: W2 goes New → Idle (rule 7: partition doesn't exist)
```
**Key insight**: Most wants go New → Idle because canonical partitions only exist when jobs are running or completed. This is correct: "nothing is building yet, ready to schedule."
## Schedulability vs Want State
**Want State**: Reflects current reality of canonical partitions
**Schedulability**: Orchestrator's decision logic for queuing jobs
**Not the same thing**:
```
Want W1: Idle → orchestrator schedules job → canonical partition becomes Building
Want W1: Idle → Building (event handling transitions it)
Want W2 arrives → sees canonical partition Building → New → Building
Orchestrator polls: both W1 and W2 are Building
Should orchestrator schedule another job? NO (lease held)
```
**Schedulability check**: A want is schedulable if canonical partition is:
- Doesn't exist (no lease), OR
- Tainted (invalid, needs rebuild), OR
- UpForRetry (lease released, deps satisfied)
**Not schedulable** if canonical partition is:
- Building (lease held, job running)
- UpstreamBuilding (lease held, waiting for deps)
**Implementation**:
```rust
fn is_schedulable(want: &Want, canonical_partitions: &BTreeMap<String, Uuid>) -> bool {
for partition_ref in &want.partitions {
if let Some(uuid) = canonical_partitions.get(partition_ref) {
let partition = partitions_by_uuid[uuid];
match partition.state {
Building | UpstreamBuilding => return false, // Lease held
Tainted | UpForRetry => continue, // Schedulable
_ => continue,
}
}
// Partition doesn't exist → schedulable
}
true
}
```
## Dependency Miss & Resolution Flow
The "dep miss" is the key mechanism for achieving multi-hop and complex data builds (traditionally solved via DAGs). When a job run fails due to missing upstream data, it generates a list of `MissingDeps`, which map the specific individual missing deps to the output partitions that needed them. This information enables databuild to create derivative wants, that will result in it scheduling jobs to build those partitions.
Complete flow when job encounters missing dependencies:
### 1. Job Reports Dep Miss
```
Job J1 building partition uuid-1 ("data/beta")
Discovers missing upstream: "data/alpha" not Live
Emits JobRunDepMissEventV1 {
missing_deps: [
MissingDeps {
missing: [ PartitionRef { ref: "data/alpha" } ],
impacted: PartitionRef { ref: "data/beta" }
}, ...
], ...
}
```
### 2. Partition Transitions to UpstreamBuilding
```rust
// handle_job_run_dep_miss_event()
partition = partition.transition_building_to_upstream_building(missing_deps);
partition.state.missing_deps = ["data/alpha"];
// Update inverted index
for upstream_ref in missing_deps {
downstream_waiting.entry(upstream_ref).or_default().push(uuid-1);
}
// downstream_waiting["data/alpha"] = [uuid-1]
// Partition remains canonical (lease still held)
// Job run transitions to DepMissed state
```
### 3. Want Transitions
```rust
// All wants waiting for "data/beta" transition Building → UpstreamBuilding
for want_id in wants_for_partition["data/beta"] {
want = want.transition_building_to_upstream_building(derivative_want_ids);
}
```
### 4. Derivative Wants Created
```rust
// System creates derivative want for missing upstream
derivative_want = Want::new(["data/alpha"]);
// This want goes New → Idle (alpha doesn't exist) → schedulable
```
### 5. Upstream Builds Complete or Fail
**Success case**:
```rust
// Derivative want builds "data/alpha" → partition becomes Live
// Look up downstream partitions waiting for "data/alpha"
if let Some(waiting_uuids) = downstream_waiting.get("data/alpha") {
for uuid in waiting_uuids {
let partition = partitions_by_uuid.get_mut(uuid).unwrap();
// Check if ALL missing deps now satisfied
let all_satisfied = partition.state.missing_deps.iter().all(|dep_ref| {
canonical_partitions.get(dep_ref)
.and_then(|uuid| partitions_by_uuid.get(uuid))
.map(|p| p.is_live())
.unwrap_or(false)
});
if all_satisfied {
partition = partition.transition_to_up_for_retry();
// Transition wants: UpstreamBuilding → Idle
}
}
}
```
**Failure case**:
```rust
// Upstream partition "data/alpha" transitions to Failed
if let Some(waiting_uuids) = downstream_waiting.get("data/alpha") {
for uuid in waiting_uuids {
let partition = partitions_by_uuid.get_mut(uuid).unwrap();
if matches!(partition, Partition::UpstreamBuilding(_)) {
partition = partition.transition_to_upstream_failed();
// Transition wants: UpstreamBuilding → UpstreamFailed
}
}
}
```
### 6. Want Becomes Schedulable
```rust
// Partition uuid-1 now in UpForRetry state
// Wants transition UpstreamBuilding → Idle
// Orchestrator polls, sees Idle wants with UpForRetry canonical partition → schedulable
// New job J2 queued → fresh uuid-2 generated for "data/beta"
// Partition uuid-2 created in Building state, replaces uuid-1 in canonical_partitions
// Partition uuid-1 remains in partitions_by_uuid (historical record)
```
**Key properties**:
- `downstream_waiting` enables O(1) lookup (no scanning all partitions)
- Failure propagates down dependency chain automatically
- Lease mechanism prevents concurrent retry attempts
- Historical partition instances preserved for lineage
## Orchestrator Responsibilities
The Orchestrator coordinates execution but maintains no state:
**Core loop**:
1. Poll BuildState for schedulable wants: `build_state.list_wants()` filtered by schedulability
2. Make scheduling decisions (respect leases, check resources, etc.)
3. Derive partition UUIDs for job: `derive_partition_uuid(job_run_id, partition_ref)`
4. Emit JobRunBufferEventV1 with job_run_id and partition_refs
5. BuildState processes event → creates partitions in Building state → updates canonical pointers → transitions wants
**Does NOT**:
- Maintain its own state (always queries BuildState)
- Know about partition UUIDs before emitting event (derives deterministically)
- Track want-partition relationships (uses inverted index)
**Separation rationale**:
- BuildState: source of truth for state
- Orchestrator: coordination logic
- Clear boundary enables testing, reasoning, replay
## Design Principles & Invariants
### 1. Compile-Time Correctness First
Invalid states should be unrepresentable. Type-state pattern enforces valid transitions at compile time.
Example: Cannot call `complete()` on a partition that isn't Building.
### 2. Runtime Panics for Invariant Violations
Reference errors and index inconsistencies represent BuildState bugs, not invalid input. Panic with context.
Example: `partitions_by_uuid[uuid]` missing → panic with "Partition UUID {uuid} referenced by canonical_partitions but not in partitions_by_uuid"
### 3. ECS Over OOP
Flat collections with inverted indexes beat nested object hierarchies for flexibility and query performance.
### 4. Data Structure Follows Access Patterns
Use inverted indexes where efficient reverse lookup is needed (`wants_for_partition`, `downstream_waiting`).
### 5. Events Represent Reality
Events encode real things: job processes started, dependency misses occurred, user requests received. Not speculative.
### 6. No Backwards Compatibility Hacks
Clean breaks preferred over technical debt. Code should be honest about state.
### 7. Fail Fast with Context
Better to panic immediately with rich context than silently corrupt state or fail later mysteriously.
### 8. Type-State for Self-Documentation
Function signatures encode preconditions: `fn schedule(want: WantWithState<IdleState>)` vs `fn schedule(want: Want)`.
## Summary
BuildState is a type-safe, event-sourced state machine using ECS patterns:
- **Compile-time correctness**: Invalid states unrepresentable
- **Flat data structures**: Collections + inverted indexes, not nested objects
- **Temporal identity**: UUID-based partition instances + canonical refs
- **Lease mechanism**: State encodes schedulability (Building/UpstreamBuilding hold lease)
- **Efficient lookups**: O(1) reverse queries via inverted indexes
- **Clear separation**: BuildState maintains state, Orchestrator coordinates
The architecture prioritizes fast feedback during development (compile errors), clear semantics (explicit states), and correctness (type-safe transitions).