stuart/databuild

Fork 0

Stuart Axelbrooke 6508809745 Add partition refactor plan

2025-11-24 10:19:24 +08:00

25 KiB

Raw Blame History

Partition Identity Refactor: Adding UUIDs for Temporal Consistency

Problem Statement

Current Architecture

Partitions are currently keyed only by their reference string (e.g., "data/beta"):

partitions: HashMap<String, Partition>  // ref → partition

When a partition transitions through states (Missing → Building → Live → Tainted), it's the same object mutating. This creates several architectural problems:

Core Issue: Lack of Temporal Identity

The fundamental problem: We cannot distinguish between "the partition being built now" and "the partition built yesterday" or "the partition that will be built tomorrow."

This manifests in several ways:

Ambiguous Job-Partition Relationships
- When job J completes, which partition instance did it build?
- If partition is rebuilt, we lose information about previous builds
- Can't answer: "What was the state of data/beta when job Y ran?"
State Mutation Loss
- Once a partition transitions Live → Tainted → Missing, the Live state information is lost
- Can't track "Partition P was built successfully by job J at time T"
- Lineage and provenance information disappears on each rebuild
Redundant Data Structures (Symptoms)
- WantAttributedPartitions in JobRunDetail exists to snapshot want-partition relationships
- Partitions carry want_ids: Vec<String> that get cleared/modified as partitions transition
- Jobs need to capture relationships at creation time because they can't be reliably reconstructed later

Concrete Bug Example

The bug that led to this design discussion illustrates the problem:

1. Want 1 created for "data/beta" → partition becomes Building
2. Want 2 created for "data/beta" → but partition is ALREADY Building
3. Job has dep miss → creates derivative want
4. System expects all wants to be Building/UpstreamBuilding, but Want 2 is Idle → panic

Root cause: All wants reference the same mutable partition object. We can't distinguish:

"The partition instance Want 1 triggered"
"The partition instance Want 2 is waiting for"
They're the same object, but semantically they represent different temporal relationships

Proposed Solution: Partition UUIDs

Architecture Changes

Two-level indexing:

// All partition instances, keyed by UUID
partitions_by_uuid: HashMap<Uuid, Partition>

// Current/canonical partition for each ref
canonical_partitions: HashMap<String, Uuid>

Key Properties

Immutable Identity: Each partition build gets a unique UUID
- Partition(uuid-1, ref="data/beta", state=Building) is a distinct entity
- When rebuilt, create Partition(uuid-2, ref="data/beta", state=Missing)
- Both can coexist; uuid-1 represents historical fact, uuid-2 is current state

Stable Job References: Jobs reference the specific partition UUIDs they built

JobRunBufferEventV1 {
    building_partition_uuids: [uuid-1, uuid-2]  // Specific instances being built
}

Wants Reference Refs: Wants continue to reference partition refs, not UUIDs

WantCreateEventV1 {
    partitions: ["data/beta"]  // User-facing reference
}
// Want's state determined by canonical partition for "data/beta"

Temporal Queries: Can reconstruct state at any point
- "What was partition uuid-1's state when job J ran?" → Look up uuid-1, it's immutable
- "Which wants were waiting for data/beta at time T?" → Check canonical partition at T
- "What's the current state of data/beta?" → canonical_partitions["data/beta"] → uuid-2

Benefits

1. Removes WantAttributedPartitions Redundancy

Before:

JobRunBufferEventV1 {
    building_partitions: [PartitionRef("data/beta")],
    // Redundant: snapshot want-partition relationship
    servicing_wants: [WantAttributedPartitions {
        want_id: "w1",
        partitions: ["data/beta"]
    }]
}

After:

JobRunBufferEventV1 {
    building_partition_uuids: [uuid-1, uuid-2]
}

// To find serviced wants:
for uuid in job.building_partition_uuids {
    let partition = partitions_by_uuid[uuid];
    for want_id in partition.want_ids {
        // transition want
    }
}

The relationship is discoverable via stable partition UUID, not baked-in at event creation.

2. Proper State Semantics for Wants

Current (problematic):

Want 1 → triggers build → Building (owns the job somehow?)
Want 2 → sees partition Building → stays Idle (different from Want 1?)
Want 3 → same partition → also Idle

With UUIDs:

Partition(uuid-1, "data/beta") created as Missing
Want 1 arrives → checks canonical["data/beta"] = uuid-1 (Missing) → Idle → schedules job
Job starts → uuid-1 becomes Building, canonical still points to uuid-1
Want 2 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
Want 3 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building
Want 4 arrives → checks canonical["data/beta"] = uuid-1 (Building) → directly to Building

All 4 wants have identical relationship to the canonical partition. The state reflects reality: "is the canonical partition for my ref being built?"

Key insight: Wants don't bind to UUIDs. They look up the canonical partition for their ref and base their state on that.

3. Historical Lineage

// Track partition lineage over time
Partition {
    uuid: uuid-3,
    partition_ref: "data/beta",
    previous_uuid: Some(uuid-2),  // Link to previous instance
    created_at: 1234567890,
    state: Live,
    produced_by_job: Some("job-xyz"),
}

Can answer:

"What partitions existed for this ref over time?"
"Which job produced this specific partition instance?"
"What was the dependency chain when this partition was built?"

Implementation Plan

Phase 1: Add UUID Infrastructure (Non-Breaking)

Goals:

Add UUID field to Partition
Create dual indexing (by UUID and by ref)
Maintain backward compatibility

Changes:

Update Partition struct (databuild/partition_state.rs)

pub struct PartitionWithState<S> {
    pub uuid: Uuid,  // NEW
    pub partition_ref: PartitionRef,
    pub want_ids: Vec<String>,
    pub state: S,
}

Add dual indexing (databuild/build_state.rs)

pub struct BuildState {
    partitions_by_uuid: BTreeMap<Uuid, Partition>,      // NEW
    canonical_partitions: BTreeMap<String, Uuid>,        // NEW
    partitions: BTreeMap<String, Partition>,             // DEPRECATED, keep for now
    // ...
}

Update partition creation
- When creating partition (Missing state), generate UUID
- Store in both maps: partitions_by_uuid[uuid] and canonical_partitions[ref] = uuid
- Keep partitions[ref] updated for backward compatibility

Add helper methods

impl BuildState {
    fn get_canonical_partition(&self, ref: &str) -> Option<&Partition> {
        self.canonical_partitions
            .get(ref)
            .and_then(|uuid| self.partitions_by_uuid.get(uuid))
    }

    fn get_canonical_partition_uuid(&self, ref: &str) -> Option<Uuid> {
        self.canonical_partitions.get(ref).copied()
    }

    fn get_partition_by_uuid(&self, uuid: &Uuid) -> Option<&Partition> {
        self.partitions_by_uuid.get(uuid)
    }
}

Phase 2: Update Want State Logic

Goals:

Wants determine state based on canonical partition
Remove schedulability check for building partitions (no longer needed)

Changes:

Update handle_want_create() (databuild/build_state.rs)

fn handle_want_create(&mut self, event: &WantCreateEventV1) -> Vec<Event> {
    // Create want in Idle state initially
    let want_idle: WantWithState<IdleState> = event.clone().into();

    // Check canonical partition states to determine want's actual initial state
    let has_building_partitions = event.partitions.iter().any(|pref| {
        matches!(
            self.get_canonical_partition(&pref.r#ref),
            Some(Partition::Building(_))
        )
    });

    let want = if has_building_partitions {
        // Canonical partition is Building → Want starts in Building
        tracing::info!(
            want_id = %event.want_id,
            "Want created in Building state (canonical partition is building)"
        );
        Want::Building(want_idle.start_building(current_timestamp()))
    } else {
        // Canonical partition not Building → Want starts in Idle
        tracing::info!(
            want_id = %event.want_id,
            "Want created in Idle state"
        );
        Want::Idle(want_idle)
    };

    self.wants.insert(event.want_id.clone(), want);

    // Register want with partitions
    for pref in &event.partitions {
        self.add_want_to_partition(pref, &event.want_id);
    }

    // Handle derivative wants if applicable
    if let Some(source) = &event.source {
        if let Some(EventSourceVariant::JobTriggered(job_triggered)) = &source.source {
            self.handle_derivative_want_creation(
                &event.want_id,
                &event.partitions,
                &job_triggered.job_run_id,
            );
        }
    }

    vec![]
}

Simplify WantSchedulability (databuild/build_state.rs)

// Remove `building` field from WantUpstreamStatus
pub struct WantUpstreamStatus {
    pub live: Vec<LivePartitionRef>,
    pub tainted: Vec<TaintedPartitionRef>,
    pub missing: Vec<MissingPartitionRef>,
    // REMOVED: pub building: Vec<BuildingPartitionRef>,
}

impl WantSchedulability {
    pub fn is_schedulable(&self) -> bool {
        // Simplified: only check upstreams
        // Building partitions now handled at want creation
        self.status.missing.is_empty() && self.status.tainted.is_empty()
    }
}

Update derivative want handling (databuild/build_state.rs)

fn handle_derivative_want_creation(...) {
    // ...existing logic...

    for want_id in impacted_want_ids {
        let want = self.wants.remove(&want_id).expect(...);
        let transitioned = match want {
            // Idle wants can exist if they arrived after job started but before dep miss
            Want::Idle(idle) => {
                tracing::info!(
                    want_id = %want_id,
                    derivative_want_id = %derivative_want_id,
                    "Want: Idle → UpstreamBuilding (partition dep miss detected)"
                );
                Want::UpstreamBuilding(
                    idle.detect_missing_deps(vec![derivative_want_id.to_string()])
                )
            }
            Want::Building(building) => {
                // Building → UpstreamBuilding
                // ... existing logic ...
            }
            Want::UpstreamBuilding(upstream) => {
                // UpstreamBuilding → UpstreamBuilding (add another upstream)
                // ... existing logic ...
            }
            _ => {
                panic!(
                    "BUG: Want {} in invalid state {:?}. Should be Idle, Building, or UpstreamBuilding.",
                    want_id, want
                );
            }
        };
        self.wants.insert(want_id, transitioned);
    }
}

Add Idle → UpstreamBuilding transition (databuild/want_state.rs)

impl WantWithState<IdleState> {
    // ... existing methods ...

    /// Transition from Idle to UpstreamBuilding when dependencies are missing
    /// This can happen if want arrives while partition is building, then job has dep miss
    pub fn detect_missing_deps(
        self,
        upstream_want_ids: Vec<String>,
    ) -> WantWithState<UpstreamBuildingState> {
        WantWithState {
            want: self.want.updated_timestamp(),
            state: UpstreamBuildingState { upstream_want_ids },
        }
    }
}

Phase 3: Update Job Events

Goals:

Jobs reference partition UUIDs, not just refs
Remove WantAttributedPartitions redundancy

Changes:

Update JobRunBufferEventV1 (databuild/databuild.proto)

message JobRunBufferEventV1 {
    string job_run_id = 1;
    string job_label = 2;
    repeated string building_partition_uuids = 3;  // NEW: UUIDs instead of refs
    repeated PartitionRef building_partitions = 4; // DEPRECATED: keep for migration
    repeated WantAttributedPartitions servicing_wants = 5; // DEPRECATED: remove later
}

Update handle_job_run_buffer() (databuild/build_state.rs)

fn handle_job_run_buffer(&mut self, event: &JobRunBufferEventV1) -> Vec<Event> {
    // Parse UUIDs from event
    let building_uuids: Vec<Uuid> = event.building_partition_uuids
        .iter()
        .map(|s| Uuid::parse_str(s).expect("Valid UUID"))
        .collect();

    // Find all wants for these partition UUIDs
    let mut impacted_want_ids: HashSet<String> = HashSet::new();
    for uuid in &building_uuids {
        if let Some(partition) = self.partitions_by_uuid.get(uuid) {
            for want_id in partition.want_ids() {
                impacted_want_ids.insert(want_id.clone());
            }
        }
    }

    // Transition wants to Building
    for want_id in impacted_want_ids {
        let want = self.wants.remove(&want_id).expect("Want must exist");
        let transitioned = match want {
            Want::Idle(idle) => Want::Building(idle.start_building(current_timestamp())),
            Want::Building(building) => Want::Building(building), // Already building
            _ => panic!("Invalid state for job buffer: {:?}", want),
        };
        self.wants.insert(want_id, transitioned);
    }

    // Transition partitions to Building by UUID
    for uuid in building_uuids {
        if let Some(partition) = self.partitions_by_uuid.remove(&uuid) {
            let building = match partition {
                Partition::Missing(missing) => {
                    Partition::Building(missing.start_building(event.job_run_id.clone()))
                }
                _ => panic!("Partition {:?} not in Missing state", uuid),
            };
            self.partitions_by_uuid.insert(uuid, building);
        }
    }

    // Create job run
    let queued: JobRunWithState<JobQueuedState> = event.clone().into();
    self.job_runs.insert(event.job_run_id.clone(), JobRun::Queued(queued));

    vec![]
}

Update Orchestrator (databuild/orchestrator.rs)

fn queue_job(&mut self, wg: WantGroup) -> Result<(), DatabuildError> {
    // Get partition refs from wants
    let wanted_refs: Vec<PartitionRef> = wg.wants
        .iter()
        .flat_map(|want| want.partitions.clone())
        .collect();

    // Resolve refs to canonical UUIDs
    let building_partition_uuids: Vec<String> = wanted_refs
        .iter()
        .filter_map(|pref| {
            self.bel.state.get_canonical_partition_uuid(&pref.r#ref)
                .map(|uuid| uuid.to_string())
        })
        .collect();

    let job_buffer_event = Event::JobRunBufferV1(JobRunBufferEventV1 {
        job_run_id: job_run_id.to_string(),
        job_label: wg.job.label,
        building_partition_uuids,  // Use canonical UUIDs
        building_partitions: vec![], // Deprecated
        servicing_wants: vec![],     // Deprecated
    });

    self.append_and_broadcast(&job_buffer_event)?;
    self.job_runs.push(job_run);
    Ok(())
}

Phase 4: Partition Lifecycle Management

Goals:

Define when new partition UUIDs are created
Handle canonical partition transitions
Implement cleanup/GC

Canonical Partition Transitions:

New partition UUID created when:

First build: Partition doesn't exist → create Partition(uuid, Missing)
Taint: Partition tainted → create new Partition(uuid-new, Missing), update canonical
Expiration: TTL exceeded → create new Partition(uuid-new, Missing), update canonical
Manual rebuild: Explicit rebuild request → create new Partition(uuid-new, Missing), update canonical

Implementation:

impl BuildState {
    /// Create a new partition instance for a ref, updating canonical pointer
    fn create_new_partition_instance(&mut self, partition_ref: &PartitionRef) -> Uuid {
        let new_uuid = Uuid::new_v4();
        let new_partition = Partition::new_missing_with_uuid(
            new_uuid,
            partition_ref.clone()
        );

        // Update canonical pointer (old UUID becomes historical)
        self.canonical_partitions.insert(
            partition_ref.r#ref.clone(),
            new_uuid
        );

        // Store new partition
        self.partitions_by_uuid.insert(new_uuid, new_partition);

        // Old partition remains in partitions_by_uuid for historical queries

        new_uuid
    }

    /// Handle partition taint - creates new instance
    fn taint_partition(&mut self, partition_ref: &str) -> Uuid {
        // Mark current partition as Tainted
        if let Some(current_uuid) = self.canonical_partitions.get(partition_ref) {
            if let Some(partition) = self.partitions_by_uuid.get_mut(current_uuid) {
                // Transition to Tainted state (keep UUID)
                *partition = match partition {
                    Partition::Live(live) => {
                        Partition::Tainted(live.clone().mark_tainted())
                    }
                    _ => panic!("Can only taint Live partitions"),
                };
            }
        }

        // Create new partition instance for rebuilding
        self.create_new_partition_instance(&PartitionRef {
            r#ref: partition_ref.to_string()
        })
    }
}

GC Strategy:

Time-based retention (recommended):

Keep partition UUIDs for N days (default 30)
Enables historical queries within retention window
Predictable storage growth

impl BuildState {
    /// Remove partition UUIDs older than retention window
    fn gc_old_partitions(&mut self, retention_days: u64) {
        let cutoff = current_timestamp() - (retention_days * 86400 * 1_000_000_000);

        // Find UUIDs to remove (not canonical + older than cutoff)
        let canonical_uuids: HashSet<Uuid> = self.canonical_partitions
            .values()
            .copied()
            .collect();

        let to_remove: Vec<Uuid> = self.partitions_by_uuid
            .iter()
            .filter_map(|(uuid, partition)| {
                if !canonical_uuids.contains(uuid) && partition.created_at() < cutoff {
                    Some(*uuid)
                } else {
                    None
                }
            })
            .collect();

        for uuid in to_remove {
            self.partitions_by_uuid.remove(&uuid);
        }
    }
}

Phase 5: Migration and Cleanup

Goals:

Remove deprecated fields
Update API responses
Complete migration

Changes:

Remove deprecated fields from protobuf
- building_partitions from JobRunBufferEventV1
- servicing_wants from JobRunBufferEventV1
- WantAttributedPartitions message
Remove backward compatibility code
- partitions: BTreeMap<String, Partition> from BuildState
- Dual writes/reads
Update API responses to include UUIDs where relevant
- JobRunDetail can include partition UUIDs built
- PartitionDetail can include UUID for debugging
Update tests to use UUID-based assertions

Design Decisions & Trade-offs

1. Wants Reference Refs, Not UUIDs

Decision: Wants always reference partition refs (e.g., "data/beta"), not UUIDs.

Rationale:

User requests "data/beta" - the current/canonical partition for that ref
Want state is based on canonical partition: "is the current partition for my ref being built?"
If partition gets tainted/rebuilt, wants see the new canonical partition automatically
Simpler mental model: want doesn't care about historical instances

How it works:

// Want creation
want.partitions = ["data/beta"]  // ref, not UUID

// Want state determination
let canonical_uuid = canonical_partitions["data/beta"];
let partition = partitions_by_uuid[canonical_uuid];
match partition.state {
    Building => want.state = Building,
    Live => want can complete,
    ...
}

2. Jobs Reference UUIDs, Not Refs

Decision: Jobs reference the specific partition UUIDs they built.

Rationale:

Jobs build specific partition instances
Historical record: "Job J built Partition(uuid-1)"
Even if partition is later tainted/rebuilt, job's record is immutable
Enables provenance: "Which job built this specific partition?"

How it works:

JobRunBufferEventV1 {
    building_partition_uuids: [uuid-1, uuid-2]  // Specific instances
}

3. UUID Generation: When?

Decision: Generate UUID during event processing (in handle_want_create, when partition created).

Rationale:

Events remain deterministic
UUID generation during replay works correctly
Maintains event sourcing principles

Not in the event itself: Would require client-side UUID generation, breaks deterministic replay.

4. Canonical Partition: One at a Time

Decision: Only one canonical partition per ref at a time.

Scenario handling:

Partition(uuid-1, "data/beta") is Building
User requests rebuild → new want sees uuid-1 is Building → want becomes Building
Want waits for uuid-1 to complete
If uuid-1 completes successfully → want completes
If uuid-1 fails or is tainted → new partition instance created (uuid-2), canonical updated

Alternative considered: Multiple concurrent builds with versioning

Significantly more complex
Defer to future work

5. Event Format: UUID as String

Decision: Store UUIDs as strings in protobuf events.

Rationale:

Human-readable in logs/debugging
Standard UUID string format (36 chars)
Protobuf has no native UUID type

Trade-off: Larger event size (36 bytes vs 16 bytes) - acceptable for debuggability.

Testing Strategy

Unit Tests

Partition UUID uniqueness
- Creating partitions generates unique UUIDs
- Same ref at different times gets different UUIDs
Canonical partition tracking
- canonical_partitions always points to current instance
- Old instances remain in partitions_by_uuid
Want state determination
- Want checks canonical partition state
- Multiple wants see same canonical partition

Integration Tests

Multi-want scenario (reproduces original bug)
- Want 1 created → partition Missing → Idle
- Job scheduled → partition Building (uuid-1)
- Wants 2-4 created → see partition Building → directly to Building
- All 4 wants reference same canonical partition uuid-1
- Job dep miss → all transition to UpstreamBuilding correctly
Rebuild scenario
- Partition built → Live (uuid-1)
- Partition tainted → new instance created (uuid-2), canonical updated
- New wants reference uuid-2
- Old partition uuid-1 still queryable for history

End-to-End Tests

Full lifecycle
- Want created → canonical partition determined
- Job runs → partition transitions through states
- Want completes → partition remains in history
- Partition expires → new UUID for rebuild, canonical updated

Future Work

1. Partition Lineage Graph

Build explicit lineage tracking:

Partition {
    uuid: uuid-3,
    partition_ref: "data/beta",
    previous_uuid: Some(uuid-2),
    derived_from: vec![uuid-4, uuid-5],  // Upstream dependencies
}

Enables:

"What was the full dependency graph when this partition was built?"
"How did data propagate through the system over time?"

2. Partition Provenance

Track complete build history:

Partition {
    uuid: uuid-1,
    provenance: Provenance {
        built_by_job: "job-123",
        source_code_version: "abc123",
        build_timestamp: 1234567890,
        input_partitions: vec![uuid-2, uuid-3],
    }
}

3. Multi-Generation Partitions

Support concurrent builds of different generations:

canonical_partitions: HashMap<String, Vec<(Generation, Uuid)>>
// "data/beta" → [(v1, uuid-1), (v2, uuid-2)]

Users can request specific generations or "latest."

Summary

Adding partition UUIDs solves fundamental architectural problems:

Temporal identity: Distinguish partition instances over time
Stable job references: Jobs reference immutable partition UUIDs they built
Wants reference refs: Want state based on canonical partition for their ref
Discoverable relationships: Remove redundant snapshot data (WantAttributedPartitions)
Proper semantics: Want state reflects actual canonical partition state
Historical queries: Can query past partition states via UUID

Key principle: Wants care about "what's the current state of data/beta?" (refs), while jobs and historical queries care about "what happened to this specific partition instance?" (UUIDs).

This refactor enables cleaner code, better observability, and proper event sourcing semantics throughout the system.

25 KiB Raw Blame History

Partition Identity Refactor: Adding UUIDs for Temporal Consistency

Problem Statement

Current Architecture

Core Issue: Lack of Temporal Identity

Concrete Bug Example

Proposed Solution: Partition UUIDs

Architecture Changes

Key Properties

Benefits

1. Removes WantAttributedPartitions Redundancy

2. Proper State Semantics for Wants

3. Historical Lineage

Implementation Plan

Phase 1: Add UUID Infrastructure (Non-Breaking)

Phase 2: Update Want State Logic

Phase 3: Update Job Events

Phase 4: Partition Lifecycle Management

Phase 5: Migration and Cleanup

Design Decisions & Trade-offs

1. Wants Reference Refs, Not UUIDs

2. Jobs Reference UUIDs, Not Refs

3. UUID Generation: When?

4. Canonical Partition: One at a Time

5. Event Format: UUID as String

Testing Strategy

Unit Tests

Integration Tests

End-to-End Tests

Future Work

1. Partition Lineage Graph

2. Partition Provenance

3. Multi-Generation Partitions

Summary

25 KiB

Raw Blame History