25 KiB
Partition Identity Refactor: Adding UUIDs for Temporal Consistency
Problem Statement
Current Architecture
Partitions are currently keyed only by their reference string (e.g., "data/beta"):
partitions: HashMap<String, Partition> // ref → partition
When a partition transitions through states (Missing → Building → Live → Tainted), it's the same object mutating. This creates several architectural problems:
Core Issue: Lack of Temporal Identity
The fundamental problem: We cannot distinguish between "the partition being built now" and "the partition built yesterday" or "the partition that will be built tomorrow."
This manifests in several ways:
-
Ambiguous Job-Partition Relationships
- When job J completes, which partition instance did it build?
- If partition is rebuilt, we lose information about previous builds
- Can't answer: "What was the state of data/beta when job Y ran?"
-
State Mutation Loss
- Once a partition transitions Live → Tainted → Missing, the Live state information is lost
- Can't track "Partition P was built successfully by job J at time T"
- Lineage and provenance information disappears on each rebuild
-
Redundant Data Structures (Symptoms)
WantAttributedPartitionsinJobRunDetailexists to snapshot want-partition relationships- Partitions carry
want_ids: Vec<String>that get cleared/modified as partitions transition - Jobs need to capture relationships at creation time because they can't be reliably reconstructed later
Concrete Bug Example
The bug that led to this design discussion illustrates the problem:
1. Want 1 created for "data/beta" → partition becomes Building
2. Want 2 created for "data/beta" → but partition is ALREADY Building
3. Job has dep miss → creates derivative want
4. System expects all wants to be Building/UpstreamBuilding, but Want 2 is Idle → panic
Root cause: All wants reference the same mutable partition object. We can't distinguish:
- "The partition instance Want 1 triggered"
- "The partition instance Want 2 is waiting for"
- They're the same object, but semantically they represent different temporal relationships
Proposed Solution: Partition UUIDs
Architecture Changes
Two-level indexing:
// All partition instances, keyed by UUID
partitions_by_uuid: HashMap<Uuid, Partition>
// Current/canonical partition for each ref
canonical_partitions: HashMap<String, Uuid>
Key Properties
-
Immutable Identity: Each partition build gets a unique UUID
Partition(uuid-1, ref="data/beta", state=Building)is a distinct entity- When rebuilt, create
Partition(uuid-2, ref="data/beta", state=Missing) - Both can coexist; uuid-1 represents historical fact, uuid-2 is current state
-
Stable Job References: Jobs reference the specific partition UUIDs they built
JobRunBufferEventV1 { building_partition_uuids: [uuid-1, uuid-2] // Specific instances being built } -
Wants Reference Refs: Wants continue to reference partition refs, not UUIDs
WantCreateEventV1 { partitions: ["data/beta"] // User-facing reference } // Want's state determined by canonical partition for "data/beta" -
Temporal Queries: Can reconstruct state at any point
- "What was partition uuid-1's state when job J ran?" → Look up uuid-1, it's immutable
- "Which wants were waiting for data/beta at time T?" → Check canonical partition at T
- "What's the current state of data/beta?" → canonical_partitions["data/beta"] → uuid-2
Benefits
1. Removes WantAttributedPartitions Redundancy
Before:
JobRunBufferEventV1 {
building_partitions: [PartitionRef("data/beta")],
// Redundant: snapshot want-partition relationship
servicing_wants: [WantAttributedPartitions {
want_id: "w1",
partitions: ["data/beta"]
}]
}
After:
JobRunBufferEventV1 {
building_partition_uuids: [uuid-1, uuid-2]
}
// To find serviced wants:
for uuid in job.building_partition_uuids {
let partition = partitions_by_uuid[uuid];
for want_id in partition.want_ids {
// transition want
}
}
The relationship is discoverable via stable partition UUID, not baked-in at event creation.
2. Proper State Semantics for Wants
Current (problematic):
Want 1 → triggers build → Building (owns the job somehow?)
Want 2 → sees partition Building → stays Idle (different from Want 1?)
Want 3 → same partition → also Idle
With UUIDs:
Partition(uuid-1, "data/beta") created as Missing
Want 1 arrives → checks canonical["data/beta"] = uuid-1 (Missing) → Idle → schedules job
Job starts → uuid-1 becomes Building, canonical still points to uuid-1
Want 2 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
Want 3 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
Want 4 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
All 4 wants have identical relationship to the canonical partition. The state reflects reality: "is the canonical partition for my ref being built?"
Key insight: Wants don't bind to UUIDs. They look up the canonical partition for their ref and base their state on that.
3. Historical Lineage
// Track partition lineage over time
Partition {
uuid: uuid-3,
partition_ref: "data/beta",
previous_uuid: Some(uuid-2), // Link to previous instance
created_at: 1234567890,
state: Live,
produced_by_job: Some("job-xyz"),
}
Can answer:
- "What partitions existed for this ref over time?"
- "Which job produced this specific partition instance?"
- "What was the dependency chain when this partition was built?"
Implementation Plan
Phase 1: Add UUID Infrastructure (Non-Breaking)
Goals:
- Add UUID field to Partition
- Create dual indexing (by UUID and by ref)
- Maintain backward compatibility
Changes:
-
Update Partition struct (databuild/partition_state.rs)
pub struct PartitionWithState<S> { pub uuid: Uuid, // NEW pub partition_ref: PartitionRef, pub want_ids: Vec<String>, pub state: S, } -
Add dual indexing (databuild/build_state.rs)
pub struct BuildState { partitions_by_uuid: BTreeMap<Uuid, Partition>, // NEW canonical_partitions: BTreeMap<String, Uuid>, // NEW partitions: BTreeMap<String, Partition>, // DEPRECATED, keep for now // ... } -
Update partition creation
- When creating partition (Missing state), generate UUID
- Store in both maps:
partitions_by_uuid[uuid]andcanonical_partitions[ref] = uuid - Keep
partitions[ref]updated for backward compatibility
-
Add helper methods
impl BuildState { fn get_canonical_partition(&self, ref: &str) -> Option<&Partition> { self.canonical_partitions .get(ref) .and_then(|uuid| self.partitions_by_uuid.get(uuid)) } fn get_canonical_partition_uuid(&self, ref: &str) -> Option<Uuid> { self.canonical_partitions.get(ref).copied() } fn get_partition_by_uuid(&self, uuid: &Uuid) -> Option<&Partition> { self.partitions_by_uuid.get(uuid) } }
Phase 2: Update Want State Logic
Goals:
- Wants determine state based on canonical partition
- Remove schedulability check for building partitions (no longer needed)
Changes:
-
Update handle_want_create() (databuild/build_state.rs)
fn handle_want_create(&mut self, event: &WantCreateEventV1) -> Vec<Event> { // Create want in Idle state initially let want_idle: WantWithState<IdleState> = event.clone().into(); // Check canonical partition states to determine want's actual initial state let has_building_partitions = event.partitions.iter().any(|pref| { matches!( self.get_canonical_partition(&pref.r#ref), Some(Partition::Building(_)) ) }); let want = if has_building_partitions { // Canonical partition is Building → Want starts in Building tracing::info!( want_id = %event.want_id, "Want created in Building state (canonical partition is building)" ); Want::Building(want_idle.start_building(current_timestamp())) } else { // Canonical partition not Building → Want starts in Idle tracing::info!( want_id = %event.want_id, "Want created in Idle state" ); Want::Idle(want_idle) }; self.wants.insert(event.want_id.clone(), want); // Register want with partitions for pref in &event.partitions { self.add_want_to_partition(pref, &event.want_id); } // Handle derivative wants if applicable if let Some(source) = &event.source { if let Some(EventSourceVariant::JobTriggered(job_triggered)) = &source.source { self.handle_derivative_want_creation( &event.want_id, &event.partitions, &job_triggered.job_run_id, ); } } vec![] } -
Simplify WantSchedulability (databuild/build_state.rs)
// Remove `building` field from WantUpstreamStatus pub struct WantUpstreamStatus { pub live: Vec<LivePartitionRef>, pub tainted: Vec<TaintedPartitionRef>, pub missing: Vec<MissingPartitionRef>, // REMOVED: pub building: Vec<BuildingPartitionRef>, } impl WantSchedulability { pub fn is_schedulable(&self) -> bool { // Simplified: only check upstreams // Building partitions now handled at want creation self.status.missing.is_empty() && self.status.tainted.is_empty() } } -
Update derivative want handling (databuild/build_state.rs)
fn handle_derivative_want_creation(...) { // ...existing logic... for want_id in impacted_want_ids { let want = self.wants.remove(&want_id).expect(...); let transitioned = match want { // Idle wants can exist if they arrived after job started but before dep miss Want::Idle(idle) => { tracing::info!( want_id = %want_id, derivative_want_id = %derivative_want_id, "Want: Idle → UpstreamBuilding (partition dep miss detected)" ); Want::UpstreamBuilding( idle.detect_missing_deps(vec![derivative_want_id.to_string()]) ) } Want::Building(building) => { // Building → UpstreamBuilding // ... existing logic ... } Want::UpstreamBuilding(upstream) => { // UpstreamBuilding → UpstreamBuilding (add another upstream) // ... existing logic ... } _ => { panic!( "BUG: Want {} in invalid state {:?}. Should be Idle, Building, or UpstreamBuilding.", want_id, want ); } }; self.wants.insert(want_id, transitioned); } } -
Add Idle → UpstreamBuilding transition (databuild/want_state.rs)
impl WantWithState<IdleState> { // ... existing methods ... /// Transition from Idle to UpstreamBuilding when dependencies are missing /// This can happen if want arrives while partition is building, then job has dep miss pub fn detect_missing_deps( self, upstream_want_ids: Vec<String>, ) -> WantWithState<UpstreamBuildingState> { WantWithState { want: self.want.updated_timestamp(), state: UpstreamBuildingState { upstream_want_ids }, } } }
Phase 3: Update Job Events
Goals:
- Jobs reference partition UUIDs, not just refs
- Remove WantAttributedPartitions redundancy
Changes:
-
Update JobRunBufferEventV1 (databuild/databuild.proto)
message JobRunBufferEventV1 { string job_run_id = 1; string job_label = 2; repeated string building_partition_uuids = 3; // NEW: UUIDs instead of refs repeated PartitionRef building_partitions = 4; // DEPRECATED: keep for migration repeated WantAttributedPartitions servicing_wants = 5; // DEPRECATED: remove later } -
Update handle_job_run_buffer() (databuild/build_state.rs)
fn handle_job_run_buffer(&mut self, event: &JobRunBufferEventV1) -> Vec<Event> { // Parse UUIDs from event let building_uuids: Vec<Uuid> = event.building_partition_uuids .iter() .map(|s| Uuid::parse_str(s).expect("Valid UUID")) .collect(); // Find all wants for these partition UUIDs let mut impacted_want_ids: HashSet<String> = HashSet::new(); for uuid in &building_uuids { if let Some(partition) = self.partitions_by_uuid.get(uuid) { for want_id in partition.want_ids() { impacted_want_ids.insert(want_id.clone()); } } } // Transition wants to Building for want_id in impacted_want_ids { let want = self.wants.remove(&want_id).expect("Want must exist"); let transitioned = match want { Want::Idle(idle) => Want::Building(idle.start_building(current_timestamp())), Want::Building(building) => Want::Building(building), // Already building _ => panic!("Invalid state for job buffer: {:?}", want), }; self.wants.insert(want_id, transitioned); } // Transition partitions to Building by UUID for uuid in building_uuids { if let Some(partition) = self.partitions_by_uuid.remove(&uuid) { let building = match partition { Partition::Missing(missing) => { Partition::Building(missing.start_building(event.job_run_id.clone())) } _ => panic!("Partition {:?} not in Missing state", uuid), }; self.partitions_by_uuid.insert(uuid, building); } } // Create job run let queued: JobRunWithState<JobQueuedState> = event.clone().into(); self.job_runs.insert(event.job_run_id.clone(), JobRun::Queued(queued)); vec![] } -
Update Orchestrator (databuild/orchestrator.rs)
fn queue_job(&mut self, wg: WantGroup) -> Result<(), DatabuildError> { // Get partition refs from wants let wanted_refs: Vec<PartitionRef> = wg.wants .iter() .flat_map(|want| want.partitions.clone()) .collect(); // Resolve refs to canonical UUIDs let building_partition_uuids: Vec<String> = wanted_refs .iter() .filter_map(|pref| { self.bel.state.get_canonical_partition_uuid(&pref.r#ref) .map(|uuid| uuid.to_string()) }) .collect(); let job_buffer_event = Event::JobRunBufferV1(JobRunBufferEventV1 { job_run_id: job_run_id.to_string(), job_label: wg.job.label, building_partition_uuids, // Use canonical UUIDs building_partitions: vec![], // Deprecated servicing_wants: vec![], // Deprecated }); self.append_and_broadcast(&job_buffer_event)?; self.job_runs.push(job_run); Ok(()) }
Phase 4: Partition Lifecycle Management
Goals:
- Define when new partition UUIDs are created
- Handle canonical partition transitions
- Implement cleanup/GC
Canonical Partition Transitions:
New partition UUID created when:
- First build: Partition doesn't exist → create Partition(uuid, Missing)
- Taint: Partition tainted → create new Partition(uuid-new, Missing), update canonical
- Expiration: TTL exceeded → create new Partition(uuid-new, Missing), update canonical
- Manual rebuild: Explicit rebuild request → create new Partition(uuid-new, Missing), update canonical
Implementation:
impl BuildState {
/// Create a new partition instance for a ref, updating canonical pointer
fn create_new_partition_instance(&mut self, partition_ref: &PartitionRef) -> Uuid {
let new_uuid = Uuid::new_v4();
let new_partition = Partition::new_missing_with_uuid(
new_uuid,
partition_ref.clone()
);
// Update canonical pointer (old UUID becomes historical)
self.canonical_partitions.insert(
partition_ref.r#ref.clone(),
new_uuid
);
// Store new partition
self.partitions_by_uuid.insert(new_uuid, new_partition);
// Old partition remains in partitions_by_uuid for historical queries
new_uuid
}
/// Handle partition taint - creates new instance
fn taint_partition(&mut self, partition_ref: &str) -> Uuid {
// Mark current partition as Tainted
if let Some(current_uuid) = self.canonical_partitions.get(partition_ref) {
if let Some(partition) = self.partitions_by_uuid.get_mut(current_uuid) {
// Transition to Tainted state (keep UUID)
*partition = match partition {
Partition::Live(live) => {
Partition::Tainted(live.clone().mark_tainted())
}
_ => panic!("Can only taint Live partitions"),
};
}
}
// Create new partition instance for rebuilding
self.create_new_partition_instance(&PartitionRef {
r#ref: partition_ref.to_string()
})
}
}
GC Strategy:
Time-based retention (recommended):
- Keep partition UUIDs for N days (default 30)
- Enables historical queries within retention window
- Predictable storage growth
impl BuildState {
/// Remove partition UUIDs older than retention window
fn gc_old_partitions(&mut self, retention_days: u64) {
let cutoff = current_timestamp() - (retention_days * 86400 * 1_000_000_000);
// Find UUIDs to remove (not canonical + older than cutoff)
let canonical_uuids: HashSet<Uuid> = self.canonical_partitions
.values()
.copied()
.collect();
let to_remove: Vec<Uuid> = self.partitions_by_uuid
.iter()
.filter_map(|(uuid, partition)| {
if !canonical_uuids.contains(uuid) && partition.created_at() < cutoff {
Some(*uuid)
} else {
None
}
})
.collect();
for uuid in to_remove {
self.partitions_by_uuid.remove(&uuid);
}
}
}
Phase 5: Migration and Cleanup
Goals:
- Remove deprecated fields
- Update API responses
- Complete migration
Changes:
-
Remove deprecated fields from protobuf
building_partitionsfromJobRunBufferEventV1servicing_wantsfromJobRunBufferEventV1WantAttributedPartitionsmessage
-
Remove backward compatibility code
partitions: BTreeMap<String, Partition>fromBuildState- Dual writes/reads
-
Update API responses to include UUIDs where relevant
- JobRunDetail can include partition UUIDs built
- PartitionDetail can include UUID for debugging
-
Update tests to use UUID-based assertions
Design Decisions & Trade-offs
1. Wants Reference Refs, Not UUIDs
Decision: Wants always reference partition refs (e.g., "data/beta"), not UUIDs.
Rationale:
- User requests "data/beta" - the current/canonical partition for that ref
- Want state is based on canonical partition: "is the current partition for my ref being built?"
- If partition gets tainted/rebuilt, wants see the new canonical partition automatically
- Simpler mental model: want doesn't care about historical instances
How it works:
// Want creation
want.partitions = ["data/beta"] // ref, not UUID
// Want state determination
let canonical_uuid = canonical_partitions["data/beta"];
let partition = partitions_by_uuid[canonical_uuid];
match partition.state {
Building => want.state = Building,
Live => want can complete,
...
}
2. Jobs Reference UUIDs, Not Refs
Decision: Jobs reference the specific partition UUIDs they built.
Rationale:
- Jobs build specific partition instances
- Historical record: "Job J built Partition(uuid-1)"
- Even if partition is later tainted/rebuilt, job's record is immutable
- Enables provenance: "Which job built this specific partition?"
How it works:
JobRunBufferEventV1 {
building_partition_uuids: [uuid-1, uuid-2] // Specific instances
}
3. UUID Generation: When?
Decision: Generate UUID during event processing (in handle_want_create, when partition created).
Rationale:
- Events remain deterministic
- UUID generation during replay works correctly
- Maintains event sourcing principles
Not in the event itself: Would require client-side UUID generation, breaks deterministic replay.
4. Canonical Partition: One at a Time
Decision: Only one canonical partition per ref at a time.
Scenario handling:
- Partition(uuid-1, "data/beta") is Building
- User requests rebuild → new want sees uuid-1 is Building → want becomes Building
- Want waits for uuid-1 to complete
- If uuid-1 completes successfully → want completes
- If uuid-1 fails or is tainted → new partition instance created (uuid-2), canonical updated
Alternative considered: Multiple concurrent builds with versioning
- Significantly more complex
- Defer to future work
5. Event Format: UUID as String
Decision: Store UUIDs as strings in protobuf events.
Rationale:
- Human-readable in logs/debugging
- Standard UUID string format (36 chars)
- Protobuf has no native UUID type
Trade-off: Larger event size (36 bytes vs 16 bytes) - acceptable for debuggability.
Testing Strategy
Unit Tests
-
Partition UUID uniqueness
- Creating partitions generates unique UUIDs
- Same ref at different times gets different UUIDs
-
Canonical partition tracking
- canonical_partitions always points to current instance
- Old instances remain in partitions_by_uuid
-
Want state determination
- Want checks canonical partition state
- Multiple wants see same canonical partition
Integration Tests
-
Multi-want scenario (reproduces original bug)
- Want 1 created → partition Missing → Idle
- Job scheduled → partition Building (uuid-1)
- Wants 2-4 created → see partition Building → directly to Building
- All 4 wants reference same canonical partition uuid-1
- Job dep miss → all transition to UpstreamBuilding correctly
-
Rebuild scenario
- Partition built → Live (uuid-1)
- Partition tainted → new instance created (uuid-2), canonical updated
- New wants reference uuid-2
- Old partition uuid-1 still queryable for history
End-to-End Tests
- Full lifecycle
- Want created → canonical partition determined
- Job runs → partition transitions through states
- Want completes → partition remains in history
- Partition expires → new UUID for rebuild, canonical updated
Future Work
1. Partition Lineage Graph
Build explicit lineage tracking:
Partition {
uuid: uuid-3,
partition_ref: "data/beta",
previous_uuid: Some(uuid-2),
derived_from: vec![uuid-4, uuid-5], // Upstream dependencies
}
Enables:
- "What was the full dependency graph when this partition was built?"
- "How did data propagate through the system over time?"
2. Partition Provenance
Track complete build history:
Partition {
uuid: uuid-1,
provenance: Provenance {
built_by_job: "job-123",
source_code_version: "abc123",
build_timestamp: 1234567890,
input_partitions: vec![uuid-2, uuid-3],
}
}
3. Multi-Generation Partitions
Support concurrent builds of different generations:
canonical_partitions: HashMap<String, Vec<(Generation, Uuid)>>
// "data/beta" → [(v1, uuid-1), (v2, uuid-2)]
Users can request specific generations or "latest."
Summary
Adding partition UUIDs solves fundamental architectural problems:
- Temporal identity: Distinguish partition instances over time
- Stable job references: Jobs reference immutable partition UUIDs they built
- Wants reference refs: Want state based on canonical partition for their ref
- Discoverable relationships: Remove redundant snapshot data (WantAttributedPartitions)
- Proper semantics: Want state reflects actual canonical partition state
- Historical queries: Can query past partition states via UUID
Key principle: Wants care about "what's the current state of data/beta?" (refs), while jobs and historical queries care about "what happened to this specific partition instance?" (UUIDs).
This refactor enables cleaner code, better observability, and proper event sourcing semantics throughout the system.