databuild/docs/plans/partitions-refactor.md

25 KiB

Partition Identity Refactor: Adding UUIDs for Temporal Consistency

Problem Statement

Current Architecture

Partitions are currently keyed only by their reference string (e.g., "data/beta"):

partitions: HashMap<String, Partition>  // ref → partition

When a partition transitions through states (Missing → Building → Live → Tainted), it's the same object mutating. This creates several architectural problems:

Core Issue: Lack of Temporal Identity

The fundamental problem: We cannot distinguish between "the partition being built now" and "the partition built yesterday" or "the partition that will be built tomorrow."

This manifests in several ways:

  1. Ambiguous Job-Partition Relationships

    • When job J completes, which partition instance did it build?
    • If partition is rebuilt, we lose information about previous builds
    • Can't answer: "What was the state of data/beta when job Y ran?"
  2. State Mutation Loss

    • Once a partition transitions Live → Tainted → Missing, the Live state information is lost
    • Can't track "Partition P was built successfully by job J at time T"
    • Lineage and provenance information disappears on each rebuild
  3. Redundant Data Structures (Symptoms)

    • WantAttributedPartitions in JobRunDetail exists to snapshot want-partition relationships
    • Partitions carry want_ids: Vec<String> that get cleared/modified as partitions transition
    • Jobs need to capture relationships at creation time because they can't be reliably reconstructed later

Concrete Bug Example

The bug that led to this design discussion illustrates the problem:

1. Want 1 created for "data/beta" → partition becomes Building
2. Want 2 created for "data/beta" → but partition is ALREADY Building
3. Job has dep miss → creates derivative want
4. System expects all wants to be Building/UpstreamBuilding, but Want 2 is Idle → panic

Root cause: All wants reference the same mutable partition object. We can't distinguish:

  • "The partition instance Want 1 triggered"
  • "The partition instance Want 2 is waiting for"
  • They're the same object, but semantically they represent different temporal relationships

Proposed Solution: Partition UUIDs

Architecture Changes

Two-level indexing:

// All partition instances, keyed by UUID
partitions_by_uuid: HashMap<Uuid, Partition>

// Current/canonical partition for each ref
canonical_partitions: HashMap<String, Uuid>

Key Properties

  1. Immutable Identity: Each partition build gets a unique UUID

    • Partition(uuid-1, ref="data/beta", state=Building) is a distinct entity
    • When rebuilt, create Partition(uuid-2, ref="data/beta", state=Missing)
    • Both can coexist; uuid-1 represents historical fact, uuid-2 is current state
  2. Stable Job References: Jobs reference the specific partition UUIDs they built

    JobRunBufferEventV1 {
        building_partition_uuids: [uuid-1, uuid-2]  // Specific instances being built
    }
    
  3. Wants Reference Refs: Wants continue to reference partition refs, not UUIDs

    WantCreateEventV1 {
        partitions: ["data/beta"]  // User-facing reference
    }
    // Want's state determined by canonical partition for "data/beta"
    
  4. Temporal Queries: Can reconstruct state at any point

    • "What was partition uuid-1's state when job J ran?" → Look up uuid-1, it's immutable
    • "Which wants were waiting for data/beta at time T?" → Check canonical partition at T
    • "What's the current state of data/beta?" → canonical_partitions["data/beta"] → uuid-2

Benefits

1. Removes WantAttributedPartitions Redundancy

Before:

JobRunBufferEventV1 {
    building_partitions: [PartitionRef("data/beta")],
    // Redundant: snapshot want-partition relationship
    servicing_wants: [WantAttributedPartitions {
        want_id: "w1",
        partitions: ["data/beta"]
    }]
}

After:

JobRunBufferEventV1 {
    building_partition_uuids: [uuid-1, uuid-2]
}

// To find serviced wants:
for uuid in job.building_partition_uuids {
    let partition = partitions_by_uuid[uuid];
    for want_id in partition.want_ids {
        // transition want
    }
}

The relationship is discoverable via stable partition UUID, not baked-in at event creation.

2. Proper State Semantics for Wants

Current (problematic):

Want 1 → triggers build → Building (owns the job somehow?)
Want 2 → sees partition Building → stays Idle (different from Want 1?)
Want 3 → same partition → also Idle

With UUIDs:

Partition(uuid-1, "data/beta") created as Missing
Want 1 arrives → checks canonical["data/beta"] = uuid-1 (Missing) → Idle → schedules job
Job starts → uuid-1 becomes Building, canonical still points to uuid-1
Want 2 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
Want 3 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
Want 4 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building

All 4 wants have identical relationship to the canonical partition. The state reflects reality: "is the canonical partition for my ref being built?"

Key insight: Wants don't bind to UUIDs. They look up the canonical partition for their ref and base their state on that.

3. Historical Lineage

// Track partition lineage over time
Partition {
    uuid: uuid-3,
    partition_ref: "data/beta",
    previous_uuid: Some(uuid-2),  // Link to previous instance
    created_at: 1234567890,
    state: Live,
    produced_by_job: Some("job-xyz"),
}

Can answer:

  • "What partitions existed for this ref over time?"
  • "Which job produced this specific partition instance?"
  • "What was the dependency chain when this partition was built?"

Implementation Plan

Phase 1: Add UUID Infrastructure (Non-Breaking)

Goals:

  • Add UUID field to Partition
  • Create dual indexing (by UUID and by ref)
  • Maintain backward compatibility

Changes:

  1. Update Partition struct (databuild/partition_state.rs)

    pub struct PartitionWithState<S> {
        pub uuid: Uuid,  // NEW
        pub partition_ref: PartitionRef,
        pub want_ids: Vec<String>,
        pub state: S,
    }
    
  2. Add dual indexing (databuild/build_state.rs)

    pub struct BuildState {
        partitions_by_uuid: BTreeMap<Uuid, Partition>,      // NEW
        canonical_partitions: BTreeMap<String, Uuid>,        // NEW
        partitions: BTreeMap<String, Partition>,             // DEPRECATED, keep for now
        // ...
    }
    
  3. Update partition creation

    • When creating partition (Missing state), generate UUID
    • Store in both maps: partitions_by_uuid[uuid] and canonical_partitions[ref] = uuid
    • Keep partitions[ref] updated for backward compatibility
  4. Add helper methods

    impl BuildState {
        fn get_canonical_partition(&self, ref: &str) -> Option<&Partition> {
            self.canonical_partitions
                .get(ref)
                .and_then(|uuid| self.partitions_by_uuid.get(uuid))
        }
    
        fn get_canonical_partition_uuid(&self, ref: &str) -> Option<Uuid> {
            self.canonical_partitions.get(ref).copied()
        }
    
        fn get_partition_by_uuid(&self, uuid: &Uuid) -> Option<&Partition> {
            self.partitions_by_uuid.get(uuid)
        }
    }
    

Phase 2: Update Want State Logic

Goals:

  • Wants determine state based on canonical partition
  • Remove schedulability check for building partitions (no longer needed)

Changes:

  1. Update handle_want_create() (databuild/build_state.rs)

    fn handle_want_create(&mut self, event: &WantCreateEventV1) -> Vec<Event> {
        // Create want in Idle state initially
        let want_idle: WantWithState<IdleState> = event.clone().into();
    
        // Check canonical partition states to determine want's actual initial state
        let has_building_partitions = event.partitions.iter().any(|pref| {
            matches!(
                self.get_canonical_partition(&pref.r#ref),
                Some(Partition::Building(_))
            )
        });
    
        let want = if has_building_partitions {
            // Canonical partition is Building → Want starts in Building
            tracing::info!(
                want_id = %event.want_id,
                "Want created in Building state (canonical partition is building)"
            );
            Want::Building(want_idle.start_building(current_timestamp()))
        } else {
            // Canonical partition not Building → Want starts in Idle
            tracing::info!(
                want_id = %event.want_id,
                "Want created in Idle state"
            );
            Want::Idle(want_idle)
        };
    
        self.wants.insert(event.want_id.clone(), want);
    
        // Register want with partitions
        for pref in &event.partitions {
            self.add_want_to_partition(pref, &event.want_id);
        }
    
        // Handle derivative wants if applicable
        if let Some(source) = &event.source {
            if let Some(EventSourceVariant::JobTriggered(job_triggered)) = &source.source {
                self.handle_derivative_want_creation(
                    &event.want_id,
                    &event.partitions,
                    &job_triggered.job_run_id,
                );
            }
        }
    
        vec![]
    }
    
  2. Simplify WantSchedulability (databuild/build_state.rs)

    // Remove `building` field from WantUpstreamStatus
    pub struct WantUpstreamStatus {
        pub live: Vec<LivePartitionRef>,
        pub tainted: Vec<TaintedPartitionRef>,
        pub missing: Vec<MissingPartitionRef>,
        // REMOVED: pub building: Vec<BuildingPartitionRef>,
    }
    
    impl WantSchedulability {
        pub fn is_schedulable(&self) -> bool {
            // Simplified: only check upstreams
            // Building partitions now handled at want creation
            self.status.missing.is_empty() && self.status.tainted.is_empty()
        }
    }
    
  3. Update derivative want handling (databuild/build_state.rs)

    fn handle_derivative_want_creation(...) {
        // ...existing logic...
    
        for want_id in impacted_want_ids {
            let want = self.wants.remove(&want_id).expect(...);
            let transitioned = match want {
                // Idle wants can exist if they arrived after job started but before dep miss
                Want::Idle(idle) => {
                    tracing::info!(
                        want_id = %want_id,
                        derivative_want_id = %derivative_want_id,
                        "Want: Idle → UpstreamBuilding (partition dep miss detected)"
                    );
                    Want::UpstreamBuilding(
                        idle.detect_missing_deps(vec![derivative_want_id.to_string()])
                    )
                }
                Want::Building(building) => {
                    // Building → UpstreamBuilding
                    // ... existing logic ...
                }
                Want::UpstreamBuilding(upstream) => {
                    // UpstreamBuilding → UpstreamBuilding (add another upstream)
                    // ... existing logic ...
                }
                _ => {
                    panic!(
                        "BUG: Want {} in invalid state {:?}. Should be Idle, Building, or UpstreamBuilding.",
                        want_id, want
                    );
                }
            };
            self.wants.insert(want_id, transitioned);
        }
    }
    
  4. Add Idle → UpstreamBuilding transition (databuild/want_state.rs)

    impl WantWithState<IdleState> {
        // ... existing methods ...
    
        /// Transition from Idle to UpstreamBuilding when dependencies are missing
        /// This can happen if want arrives while partition is building, then job has dep miss
        pub fn detect_missing_deps(
            self,
            upstream_want_ids: Vec<String>,
        ) -> WantWithState<UpstreamBuildingState> {
            WantWithState {
                want: self.want.updated_timestamp(),
                state: UpstreamBuildingState { upstream_want_ids },
            }
        }
    }
    

Phase 3: Update Job Events

Goals:

  • Jobs reference partition UUIDs, not just refs
  • Remove WantAttributedPartitions redundancy

Changes:

  1. Update JobRunBufferEventV1 (databuild/databuild.proto)

    message JobRunBufferEventV1 {
        string job_run_id = 1;
        string job_label = 2;
        repeated string building_partition_uuids = 3;  // NEW: UUIDs instead of refs
        repeated PartitionRef building_partitions = 4; // DEPRECATED: keep for migration
        repeated WantAttributedPartitions servicing_wants = 5; // DEPRECATED: remove later
    }
    
  2. Update handle_job_run_buffer() (databuild/build_state.rs)

    fn handle_job_run_buffer(&mut self, event: &JobRunBufferEventV1) -> Vec<Event> {
        // Parse UUIDs from event
        let building_uuids: Vec<Uuid> = event.building_partition_uuids
            .iter()
            .map(|s| Uuid::parse_str(s).expect("Valid UUID"))
            .collect();
    
        // Find all wants for these partition UUIDs
        let mut impacted_want_ids: HashSet<String> = HashSet::new();
        for uuid in &building_uuids {
            if let Some(partition) = self.partitions_by_uuid.get(uuid) {
                for want_id in partition.want_ids() {
                    impacted_want_ids.insert(want_id.clone());
                }
            }
        }
    
        // Transition wants to Building
        for want_id in impacted_want_ids {
            let want = self.wants.remove(&want_id).expect("Want must exist");
            let transitioned = match want {
                Want::Idle(idle) => Want::Building(idle.start_building(current_timestamp())),
                Want::Building(building) => Want::Building(building), // Already building
                _ => panic!("Invalid state for job buffer: {:?}", want),
            };
            self.wants.insert(want_id, transitioned);
        }
    
        // Transition partitions to Building by UUID
        for uuid in building_uuids {
            if let Some(partition) = self.partitions_by_uuid.remove(&uuid) {
                let building = match partition {
                    Partition::Missing(missing) => {
                        Partition::Building(missing.start_building(event.job_run_id.clone()))
                    }
                    _ => panic!("Partition {:?} not in Missing state", uuid),
                };
                self.partitions_by_uuid.insert(uuid, building);
            }
        }
    
        // Create job run
        let queued: JobRunWithState<JobQueuedState> = event.clone().into();
        self.job_runs.insert(event.job_run_id.clone(), JobRun::Queued(queued));
    
        vec![]
    }
    
  3. Update Orchestrator (databuild/orchestrator.rs)

    fn queue_job(&mut self, wg: WantGroup) -> Result<(), DatabuildError> {
        // Get partition refs from wants
        let wanted_refs: Vec<PartitionRef> = wg.wants
            .iter()
            .flat_map(|want| want.partitions.clone())
            .collect();
    
        // Resolve refs to canonical UUIDs
        let building_partition_uuids: Vec<String> = wanted_refs
            .iter()
            .filter_map(|pref| {
                self.bel.state.get_canonical_partition_uuid(&pref.r#ref)
                    .map(|uuid| uuid.to_string())
            })
            .collect();
    
        let job_buffer_event = Event::JobRunBufferV1(JobRunBufferEventV1 {
            job_run_id: job_run_id.to_string(),
            job_label: wg.job.label,
            building_partition_uuids,  // Use canonical UUIDs
            building_partitions: vec![], // Deprecated
            servicing_wants: vec![],     // Deprecated
        });
    
        self.append_and_broadcast(&job_buffer_event)?;
        self.job_runs.push(job_run);
        Ok(())
    }
    

Phase 4: Partition Lifecycle Management

Goals:

  • Define when new partition UUIDs are created
  • Handle canonical partition transitions
  • Implement cleanup/GC

Canonical Partition Transitions:

New partition UUID created when:

  1. First build: Partition doesn't exist → create Partition(uuid, Missing)
  2. Taint: Partition tainted → create new Partition(uuid-new, Missing), update canonical
  3. Expiration: TTL exceeded → create new Partition(uuid-new, Missing), update canonical
  4. Manual rebuild: Explicit rebuild request → create new Partition(uuid-new, Missing), update canonical

Implementation:

impl BuildState {
    /// Create a new partition instance for a ref, updating canonical pointer
    fn create_new_partition_instance(&mut self, partition_ref: &PartitionRef) -> Uuid {
        let new_uuid = Uuid::new_v4();
        let new_partition = Partition::new_missing_with_uuid(
            new_uuid,
            partition_ref.clone()
        );

        // Update canonical pointer (old UUID becomes historical)
        self.canonical_partitions.insert(
            partition_ref.r#ref.clone(),
            new_uuid
        );

        // Store new partition
        self.partitions_by_uuid.insert(new_uuid, new_partition);

        // Old partition remains in partitions_by_uuid for historical queries

        new_uuid
    }

    /// Handle partition taint - creates new instance
    fn taint_partition(&mut self, partition_ref: &str) -> Uuid {
        // Mark current partition as Tainted
        if let Some(current_uuid) = self.canonical_partitions.get(partition_ref) {
            if let Some(partition) = self.partitions_by_uuid.get_mut(current_uuid) {
                // Transition to Tainted state (keep UUID)
                *partition = match partition {
                    Partition::Live(live) => {
                        Partition::Tainted(live.clone().mark_tainted())
                    }
                    _ => panic!("Can only taint Live partitions"),
                };
            }
        }

        // Create new partition instance for rebuilding
        self.create_new_partition_instance(&PartitionRef {
            r#ref: partition_ref.to_string()
        })
    }
}

GC Strategy:

Time-based retention (recommended):

  • Keep partition UUIDs for N days (default 30)
  • Enables historical queries within retention window
  • Predictable storage growth
impl BuildState {
    /// Remove partition UUIDs older than retention window
    fn gc_old_partitions(&mut self, retention_days: u64) {
        let cutoff = current_timestamp() - (retention_days * 86400 * 1_000_000_000);

        // Find UUIDs to remove (not canonical + older than cutoff)
        let canonical_uuids: HashSet<Uuid> = self.canonical_partitions
            .values()
            .copied()
            .collect();

        let to_remove: Vec<Uuid> = self.partitions_by_uuid
            .iter()
            .filter_map(|(uuid, partition)| {
                if !canonical_uuids.contains(uuid) && partition.created_at() < cutoff {
                    Some(*uuid)
                } else {
                    None
                }
            })
            .collect();

        for uuid in to_remove {
            self.partitions_by_uuid.remove(&uuid);
        }
    }
}

Phase 5: Migration and Cleanup

Goals:

  • Remove deprecated fields
  • Update API responses
  • Complete migration

Changes:

  1. Remove deprecated fields from protobuf

    • building_partitions from JobRunBufferEventV1
    • servicing_wants from JobRunBufferEventV1
    • WantAttributedPartitions message
  2. Remove backward compatibility code

    • partitions: BTreeMap<String, Partition> from BuildState
    • Dual writes/reads
  3. Update API responses to include UUIDs where relevant

    • JobRunDetail can include partition UUIDs built
    • PartitionDetail can include UUID for debugging
  4. Update tests to use UUID-based assertions

Design Decisions & Trade-offs

1. Wants Reference Refs, Not UUIDs

Decision: Wants always reference partition refs (e.g., "data/beta"), not UUIDs.

Rationale:

  • User requests "data/beta" - the current/canonical partition for that ref
  • Want state is based on canonical partition: "is the current partition for my ref being built?"
  • If partition gets tainted/rebuilt, wants see the new canonical partition automatically
  • Simpler mental model: want doesn't care about historical instances

How it works:

// Want creation
want.partitions = ["data/beta"]  // ref, not UUID

// Want state determination
let canonical_uuid = canonical_partitions["data/beta"];
let partition = partitions_by_uuid[canonical_uuid];
match partition.state {
    Building => want.state = Building,
    Live => want can complete,
    ...
}

2. Jobs Reference UUIDs, Not Refs

Decision: Jobs reference the specific partition UUIDs they built.

Rationale:

  • Jobs build specific partition instances
  • Historical record: "Job J built Partition(uuid-1)"
  • Even if partition is later tainted/rebuilt, job's record is immutable
  • Enables provenance: "Which job built this specific partition?"

How it works:

JobRunBufferEventV1 {
    building_partition_uuids: [uuid-1, uuid-2]  // Specific instances
}

3. UUID Generation: When?

Decision: Generate UUID during event processing (in handle_want_create, when partition created).

Rationale:

  • Events remain deterministic
  • UUID generation during replay works correctly
  • Maintains event sourcing principles

Not in the event itself: Would require client-side UUID generation, breaks deterministic replay.

4. Canonical Partition: One at a Time

Decision: Only one canonical partition per ref at a time.

Scenario handling:

  • Partition(uuid-1, "data/beta") is Building
  • User requests rebuild → new want sees uuid-1 is Building → want becomes Building
  • Want waits for uuid-1 to complete
  • If uuid-1 completes successfully → want completes
  • If uuid-1 fails or is tainted → new partition instance created (uuid-2), canonical updated

Alternative considered: Multiple concurrent builds with versioning

  • Significantly more complex
  • Defer to future work

5. Event Format: UUID as String

Decision: Store UUIDs as strings in protobuf events.

Rationale:

  • Human-readable in logs/debugging
  • Standard UUID string format (36 chars)
  • Protobuf has no native UUID type

Trade-off: Larger event size (36 bytes vs 16 bytes) - acceptable for debuggability.

Testing Strategy

Unit Tests

  1. Partition UUID uniqueness

    • Creating partitions generates unique UUIDs
    • Same ref at different times gets different UUIDs
  2. Canonical partition tracking

    • canonical_partitions always points to current instance
    • Old instances remain in partitions_by_uuid
  3. Want state determination

    • Want checks canonical partition state
    • Multiple wants see same canonical partition

Integration Tests

  1. Multi-want scenario (reproduces original bug)

    • Want 1 created → partition Missing → Idle
    • Job scheduled → partition Building (uuid-1)
    • Wants 2-4 created → see partition Building → directly to Building
    • All 4 wants reference same canonical partition uuid-1
    • Job dep miss → all transition to UpstreamBuilding correctly
  2. Rebuild scenario

    • Partition built → Live (uuid-1)
    • Partition tainted → new instance created (uuid-2), canonical updated
    • New wants reference uuid-2
    • Old partition uuid-1 still queryable for history

End-to-End Tests

  1. Full lifecycle
    • Want created → canonical partition determined
    • Job runs → partition transitions through states
    • Want completes → partition remains in history
    • Partition expires → new UUID for rebuild, canonical updated

Future Work

1. Partition Lineage Graph

Build explicit lineage tracking:

Partition {
    uuid: uuid-3,
    partition_ref: "data/beta",
    previous_uuid: Some(uuid-2),
    derived_from: vec![uuid-4, uuid-5],  // Upstream dependencies
}

Enables:

  • "What was the full dependency graph when this partition was built?"
  • "How did data propagate through the system over time?"

2. Partition Provenance

Track complete build history:

Partition {
    uuid: uuid-1,
    provenance: Provenance {
        built_by_job: "job-123",
        source_code_version: "abc123",
        build_timestamp: 1234567890,
        input_partitions: vec![uuid-2, uuid-3],
    }
}

3. Multi-Generation Partitions

Support concurrent builds of different generations:

canonical_partitions: HashMap<String, Vec<(Generation, Uuid)>>
// "data/beta" → [(v1, uuid-1), (v2, uuid-2)]

Users can request specific generations or "latest."

Summary

Adding partition UUIDs solves fundamental architectural problems:

  • Temporal identity: Distinguish partition instances over time
  • Stable job references: Jobs reference immutable partition UUIDs they built
  • Wants reference refs: Want state based on canonical partition for their ref
  • Discoverable relationships: Remove redundant snapshot data (WantAttributedPartitions)
  • Proper semantics: Want state reflects actual canonical partition state
  • Historical queries: Can query past partition states via UUID

Key principle: Wants care about "what's the current state of data/beta?" (refs), while jobs and historical queries care about "what happened to this specific partition instance?" (UUIDs).

This refactor enables cleaner code, better observability, and proper event sourcing semantics throughout the system.