35 KiB
Partition Identity Refactor: Adding UUIDs for Temporal Consistency
Problem Statement
Current Architecture
Partitions are currently keyed only by their reference string (e.g., "data/beta"):
partitions: HashMap<String, Partition> // ref → partition
When a partition transitions through states (Missing → Building → Live → Tainted), it's the same object mutating. This creates several architectural problems:
Core Issue: Lack of Temporal Identity
The fundamental problem: We cannot distinguish between "the partition being built now" and "the partition built yesterday" or "the partition that will be built tomorrow."
This manifests in several ways:
-
Ambiguous Job-Partition Relationships
- When job J completes, which partition instance did it build?
- If partition is rebuilt, we lose information about previous builds
- Can't answer: "What was the state of data/beta when job Y ran?"
-
State Mutation Loss
- Once a partition transitions Live → Tainted → Missing, the Live state information is lost
- Can't track "Partition P was built successfully by job J at time T"
- Lineage and provenance information disappears on each rebuild
-
Redundant Data Structures (Symptoms)
WantAttributedPartitionsinJobRunDetailexists to snapshot want-partition relationships- Partitions carry
want_ids: Vec<String>that get cleared/modified as partitions transition - Jobs need to capture relationships at creation time because they can't be reliably reconstructed later
Concrete Bug Example
The bug that led to this design discussion illustrates the problem:
1. Want 1 created for "data/beta" → partition becomes Building
2. Want 2 created for "data/beta" → but partition is ALREADY Building
3. Job has dep miss → creates derivative want
4. System expects all wants to be Building/UpstreamBuilding, but Want 2 is Idle → panic
Root cause: All wants reference the same mutable partition object. We can't distinguish:
- "The partition instance Want 1 triggered"
- "The partition instance Want 2 is waiting for"
- They're the same object, but semantically they represent different temporal relationships
Proposed Solution: Partition UUIDs
Architecture Changes
Two-level indexing:
// All partition instances, keyed by UUID
partitions_by_uuid: HashMap<Uuid, Partition>
// Current/canonical partition for each ref
canonical_partitions: HashMap<String, Uuid>
Key Properties
-
Immutable Identity: Each partition build gets a unique UUID
Partition(uuid-1, ref="data/beta", state=Building)is a distinct entity- When rebuilt, create
Partition(uuid-2, ref="data/beta", state=Missing) - Both can coexist; uuid-1 represents historical fact, uuid-2 is current state
-
Stable Job References: Jobs reference the specific partition UUIDs they built
JobRunBufferEventV1 { building_partition_uuids: [uuid-1, uuid-2] // Specific instances being built } -
Wants Reference Refs: Wants continue to reference partition refs, not UUIDs
WantCreateEventV1 { partitions: ["data/beta"] // User-facing reference } // Want's state determined by canonical partition for "data/beta" -
Temporal Queries: Can reconstruct state at any point
- "What was partition uuid-1's state when job J ran?" → Look up uuid-1, it's immutable
- "Which wants were waiting for data/beta at time T?" → Check canonical partition at T
- "What's the current state of data/beta?" → canonical_partitions["data/beta"] → uuid-2
Benefits
1. Removes WantAttributedPartitions Redundancy
Before:
JobRunBufferEventV1 {
building_partitions: [PartitionRef("data/beta")],
// Redundant: snapshot want-partition relationship
servicing_wants: [WantAttributedPartitions {
want_id: "w1",
partitions: ["data/beta"]
}]
}
After:
JobRunBufferEventV1 {
building_partition_uuids: [uuid-1, uuid-2]
}
// To find serviced wants - use inverted index in BuildState
for uuid in job.building_partition_uuids {
let partition = partitions_by_uuid[uuid];
let partition_ref = &partition.partition_ref.r#ref;
// Look up wants via inverted index (not stored on partition)
if let Some(want_ids) = wants_for_partition.get(partition_ref) {
for want_id in want_ids {
// transition want
}
}
}
The relationship is discoverable via inverted index, not baked-in at event creation or stored on partitions.
Key improvement: Partitions don't store want_ids. This is cleaner separation of concerns:
- Want → Partition: Inherent (want defines partitions it wants)
- Partition → Want: Derived (maintained as inverted index in BuildState)
Note on want state vs schedulability:
- Want state (Building) reflects current reality: "my partitions are being built"
- Schedulability prevents duplicate jobs: "don't schedule another job if partitions already building"
- Both mechanisms needed: state for correctness, schedulability for efficiency
2. Proper State Semantics for Wants
Current (problematic):
Want 1 → triggers build → Building (owns the job somehow?)
Want 2 → sees partition Building → stays Idle (different from Want 1?)
Want 3 → same partition → also Idle
With UUIDs and New state:
Want 1 arrives → New → no canonical partition exists → Idle → schedulable
Orchestrator queues job → generates uuid-1 for "data/beta"
Job buffer event → creates Partition(uuid-1, "data/beta", Building)
→ updates canonical["data/beta"] = uuid-1
→ transitions Want 1: Idle → Building
Want 2 arrives → New → canonical["data/beta"] = uuid-1 (Building) → Building
Want 3 arrives → New → canonical["data/beta"] = uuid-1 (Building) → Building
Want 4 arrives → New → canonical["data/beta"] = uuid-1 (Building) → Building
All 4 wants have identical relationship to the canonical partition. The state reflects reality: "is the canonical partition for my ref being built?"
Key insights:
- Wants don't bind to UUIDs. They look up the canonical partition for their ref and base their state on that.
- New state makes state determination explicit: want creation → observe world → transition to appropriate state
3. Historical Lineage
// Track partition lineage over time
Partition {
uuid: uuid-3,
partition_ref: "data/beta",
previous_uuid: Some(uuid-2), // Link to previous instance
created_at: 1234567890,
state: Live,
produced_by_job: Some("job-xyz"),
}
Can answer:
- "What partitions existed for this ref over time?"
- "Which job produced this specific partition instance?"
- "What was the dependency chain when this partition was built?"
Implementation Plan
Phase 1: Add UUID Infrastructure (Non-Breaking)
Goals:
- Add UUID field to Partition
- Create dual indexing (by UUID and by ref)
- Maintain backward compatibility
Changes:
-
Update Partition struct (databuild/partition_state.rs)
Add UUID field to partition:
uuid: Uuid- Unique identifier for this partition instance- Remove
want_idsfield (now maintained as inverted index in BuildState)
Update partition state machine:
States:
- Building: Job actively building this partition
- UpstreamBuilding: Job had dep miss, partition waiting for upstream dependencies (stores MissingDeps)
- UpForRetry: Upstream dependencies satisfied, partition ready to retry building
- Live: Successfully built
- Failed: Hard failure (shouldn't retry)
- UpstreamFailed: Partition failed because upstream dependencies failed (terminal state)
- Tainted: Marked invalid by taint event
Removed: Missing state - partitions only exist when jobs start building them or are completed.
Key transitions:
- Building → UpstreamBuilding (job reports dep miss)
- UpstreamBuilding → UpForRetry (all upstream deps satisfied)
- UpstreamBuilding → UpstreamFailed (upstream dependency hard failure)
- Building → Live (job succeeds)
- Building → Failed (job hard failure)
- UpForRetry → Building (new job queued for retry, creates fresh UUID)
- Live → Tainted (partition tainted)
-
Add dual indexing and inverted indexes (databuild/build_state.rs)
pub struct BuildState { partitions_by_uuid: BTreeMap<Uuid, Partition>, // NEW canonical_partitions: BTreeMap<String, Uuid>, // NEW wants_for_partition: BTreeMap<String, Vec<String>>, // NEW: partition ref → want IDs downstream_waiting: BTreeMap<String, Vec<Uuid>>, // NEW: partition ref → UUIDs waiting for it partitions: BTreeMap<String, Partition>, // DEPRECATED, keep for now // ... }Rationale for inverted indexes:
wants_for_partition:- Partitions shouldn't know about wants (layering violation)
- Want → Partition is inherent (want defines what it wants)
- Partition → Want is derived (computed from wants, maintained as index)
- BuildState owns this inverted relationship
downstream_waiting:- Enables efficient dep miss resolution: when partition becomes Live, directly find which partitions are waiting for it
- Maps upstream partition ref → list of downstream partition UUIDs that have this ref in their MissingDeps
- Avoids scanning all UpstreamBuilding partitions when upstreams complete
- O(1) lookup to find affected partitions
-
Partition creation happens at job buffer time
Partitions are only created when a job starts building them:
- Orchestrator generates fresh UUIDs when queuing job
handle_job_run_buffer()creates partitions directly in Building state with those UUIDs- Store in both maps:
partitions_by_uuid[uuid]andcanonical_partitions[ref] = uuid - Keep
partitions[ref]updated for backward compatibility during migration
No partitions created during want creation - wants just register in inverted index.
-
Add helper methods for accessing partitions by UUID and ref
get_canonical_partition(ref)- lookup canonical partition for a refget_canonical_partition_uuid(ref)- get UUID of canonical partitionget_partition_by_uuid(uuid)- direct UUID lookupget_wants_for_partition(ref)- query inverted index
-
Update inverted index maintenance
When wants are created, the
wants_for_partitionindex must be updated:- Want creation: Add want_id to index for each partition ref in the want
- Want completion/cancellation: For now, do NOT remove from index. Cleanup can be added later if needed.
No partition creation needed - just update the index. Partitions are created later when jobs are queued.
Rationale for not cleaning up:
- Index size should be manageable for now
- Cleanup logic is straightforward to add later when needed
- Avoids complexity around replay (removal operations not in event log)
Key consideration: The index maps partition refs (not UUIDs) to want IDs, since wants reference refs. When a partition is rebuilt with a new UUID, the same ref continues to map to the same wants until those wants complete.
Phase 2: Add New State and Want State Sensing
Goals:
- Add explicit "New" state to Want state machine
- Wants sense canonical partition state and transition appropriately
- Clarify distinction between want state and schedulability
Changes:
-
Add New state to want_state.rs
Add a new state that represents a want that has just been created but hasn't yet observed the world:
NewState- Want has been created from event, state not yet determined- Transitions from New:
- New → Failed (any partition failed)
- New → Successful (all partitions live)
- New → Building (any partition building)
- New → Idle (partitions don't exist or other states)
This makes state determination explicit and observable in the event log.
-
Update handle_want_create() to sense and transition
During want creation event processing:
- Create want in New state from WantCreateEventV1
- Register want in inverted index (
wants_for_partition) - Check canonical partition states for all partition refs
- Transition based on observation (in priority order):
- If ANY canonical partition is Failed → New → Failed (job can't be safely retried)
- If ANY canonical partition is UpstreamFailed → New → UpstreamFailed (upstream deps failed)
- If ALL canonical partitions exist AND are Live → New → Successful (already built!)
- If ANY canonical partition is Building → New → Building (being built now)
- If ANY canonical partition is UpstreamBuilding → New → UpstreamBuilding (waiting for deps)
- If ANY canonical partition is UpForRetry → New → Idle (deps satisfied, ready to schedule)
- Otherwise (partitions don't exist or other states) → New → Idle (need to schedule)
- For derivative wants, additional logic may transition to UpstreamBuilding
Key insight: Most wants will go New → Idle because partitions won't exist yet (only created when jobs start). Subsequent wants for already-building partitions go New → Building. Wants arriving during dep miss go New → UpstreamBuilding. Wants for partitions ready to retry go New → Idle. Wants for already-Live partitions go New → Successful. Wants for Failed or UpstreamFailed partitions go New → Failed/UpstreamFailed.
-
Keep WantSchedulability building check
Important distinction: Want state vs. schedulability are different concerns:
- Want state (New → Building): "Are my partitions currently being built?" - Reflects reality
- Schedulability: "Should the orchestrator start a NEW job for this want?" - Prevents duplicate jobs
Example scenario:
Want 1: Idle → schedules job → partition becomes Building → want becomes Building Want 2 arrives → sees partition Building → New → Building Orchestrator polls: both wants are Building, but should NOT schedule another jobThe
buildingfield inWantUpstreamStatusremains necessary to prevent duplicate job scheduling. A want can be in Building state but not schedulable if partitions are already being built by another job.Keep the existing schedulability logic that checks
building.is_empty(). -
Update derivative want handling
Modify
handle_derivative_want_creation()to handle wants in their appropriate states:- Building → UpstreamBuilding: Want is Building when dep miss occurs (normal case)
- UpstreamBuilding → UpstreamBuilding: Want already waiting on upstreams, add another (additional dep miss)
Note: Idle wants should NOT be present during derivative want creation. If partitions are building (which they must be for a job to report dep miss), wants would have been created in Building state via New → Building transition.
-
Add required state transitions in want_state.rs
New transitions needed:
- New → Failed: Any partition failed
- New → UpstreamFailed: Any partition upstream failed
- New → Successful: All partitions live
- New → Idle: Normal case, partitions don't exist
- New → Building: Partitions already building when want created
- Building → UpstreamBuilding: Job reports dep miss (first time)
- UpstreamBuilding → UpstreamBuilding: Additional upstreams added
Note: New → UpstreamBuilding is not needed - wants go New → Building first, then Building → UpstreamBuilding when dep miss occurs.
Phase 3: Update Job Events
Goals:
- Jobs reference partition UUIDs, not just refs
- Remove WantAttributedPartitions redundancy
Changes:
-
Update JobRunBufferEventV1 in databuild.proto
Add new message and field:
message PartitionInstanceRef { PartitionRef partition_ref = 1; string uuid = 2; // UUID as string } message JobRunBufferEventV1 { // ... existing fields ... repeated PartitionInstanceRef building_partitions_v2 = 6; // NEW repeated PartitionRef building_partitions = 4; // DEPRECATED repeated WantAttributedPartitions servicing_wants = 5; // DEPRECATED }This pairs each partition ref with its UUID, solving the mapping problem.
-
Update handle_job_run_buffer() in build_state.rs
Change partition and want lookup logic:
- Parse UUIDs from event (need partition refs too - consider adding to event or deriving from wants)
- Create partitions directly in Building state with these UUIDs (no Missing state)
- Update
canonical_partitionsto point refs to these new UUIDs - Use inverted index (
wants_for_partition) to find wants for each partition ref - Transition those wants: Idle → Building (or stay Building if already there)
- Create job run in Queued state
Key changes:
- Partitions created here, not during want creation
- No Missing → Building transition, created directly as Building
- Use inverted index for want discovery (not stored on partition or in event)
-
Update Orchestrator's queue_job() in orchestrator.rs
When creating JobRunBufferEventV1:
- Get partition refs from wants (existing logic)
- Generate fresh UUIDs for each unique partition ref (one UUID per ref)
- Include UUID list in event along with refs (may need to update event schema)
- Orchestrator no longer needs to track or snapshot want-partition relationships
Key change: Orchestrator generates UUIDs at job queue time, not looking up canonical partitions. Each job attempt gets fresh UUIDs. The event handler will create partitions in Building state with these UUIDs and update canonical pointers.
This eliminates WantAttributedPartitions entirely - relationships are discoverable via inverted index.
Phase 4: Partition Lifecycle Management
Goals:
- Define when new partition UUIDs are created
- Handle canonical partition transitions
Canonical Partition Transitions:
New partition UUID created when:
- First build: Orchestrator queues job → generates UUID → partition created directly as Building
- Taint: Partition tainted → transition current to Tainted state (keeps UUID, stays canonical so readers can see it's tainted)
- Rebuild after taint: Existing want (still within TTL) sees tainted partition → triggers new job → orchestrator generates fresh UUID → new partition replaces tainted one in canonical_partitions
Note on TTL/SLA: These are want properties, not partition properties. TTL defines how long after want creation the orchestrator should keep attempting to build partitions. When a partition is tainted, wants within TTL will keep retrying. SLA is an alarm threshold. Partitions don't expire - they stay Live until explicitly tainted or replaced by a new build.
Key principles:
- Building state as lease: The Building state serves as a lease mechanism. While a partition is in Building state, the orchestrator will not attempt to schedule additional jobs to build that partition. This prevents concurrent/duplicate builds. The lease is released when the partition transitions to Live, Failed, or when a new partition instance with a fresh UUID is created and becomes canonical (e.g., after the building job reports dep miss and a new job is queued).
- When canonical pointer is updated (e.g., new build replaces tainted partition), old partition UUID remains in
partitions_by_uuidfor historical queries - Canonical pointer always points to current/active partition instance (Building, Live, Failed, or Tainted)
- Tainted partitions stay canonical until replaced - readers need to see they're tainted
- Old instances become immutable historical records
- No Missing state - partitions only exist when jobs are actively building them or completed
Partition Creation:
Partitions created during handle_job_run_buffer():
- UUIDs come from the event (generated by orchestrator)
- Create partition directly in Building state with job_run_id
- Update
canonical_partitionsmap to point ref → UUID - Store in
partitions_by_uuid - If replacing a tainted/failed partition, old one remains in
partitions_by_uuidby its UUID
Dep Miss Handling:
Complete flow when a job has dependency miss:
-
Job reports dep miss:
- Job building partition uuid-1 encounters missing upstream deps
- JobRunDepMissEventV1 emitted with MissingDeps (partition refs needed)
- Derivative wants created for missing upstream partitions
-
Partition transitions to UpstreamBuilding:
- Partition uuid-1: Building → UpstreamBuilding
- Store MissingDeps in partition state (which upstream refs it's waiting for)
- Update inverted index: For each missing dep ref, add uuid-1 to
downstream_waiting[missing_dep_ref] - Partition remains canonical (holds lease - prevents concurrent retry attempts)
- Job run transitions to DepMissed state
-
Want transitions:
- Wants for partition: Building → UpstreamBuilding
- Wants track the derivative want IDs in their UpstreamBuildingState
-
Upstream builds complete or fail:
-
Success case: Derivative wants build upstream partitions → upstream partition becomes Live
- Lookup downstream_waiting: Get
downstream_waiting[upstream_partition_ref]→ list of UUIDs waiting for this upstream - For each waiting partition UUID:
- Get partition from
partitions_by_uuid[uuid] - Check if ALL its MissingDeps are now satisfied (canonical partitions for all refs are Live)
- If satisfied: transition partition UpstreamBuilding → UpForRetry
- Remove uuid from
downstream_waitingentries (cleanup)
- Get partition from
- Lookup downstream_waiting: Get
-
Failure case: Upstream partition transitions to Failed (hard failure)
- Lookup downstream_waiting: Get
downstream_waiting[failed_partition_ref]→ list of UUIDs waiting for this upstream - For each waiting partition UUID in UpstreamBuilding state:
- Transition partition: UpstreamBuilding → UpstreamFailed
- Transition associated wants: UpstreamBuilding → UpstreamFailed
- Remove uuid from
downstream_waitingentries (cleanup)
- This propagates failure information down the dependency chain
- Lookup downstream_waiting: Get
-
-
Want becomes schedulable:
- When partition transitions to UpForRetry, wants transition: UpstreamBuilding → Idle
- Orchestrator sees Idle wants with UpForRetry canonical partitions → schedulable
- New job queued → fresh UUID (uuid-2) generated
- Partition uuid-2 created as Building, replaces uuid-1 in canonical_partitions
- Partition uuid-1 (UpForRetry) remains in partitions_by_uuid as historical record
-
New wants during dep miss:
- Want arrives while partition is UpstreamBuilding → New → UpstreamBuilding (correctly waits)
- Want arrives while partition is UpForRetry → New → Idle (correctly schedulable)
Key properties:
- Building state acts as lease (no concurrent builds)
- UpstreamBuilding also acts as lease (upstreams not ready, can't retry yet)
- UpForRetry releases lease (upstreams ready, safe to schedule)
- Failed releases lease but blocks new wants (hard failure, shouldn't retry)
- UpstreamFailed releases lease and blocks new wants (upstream deps failed, can't succeed)
downstream_waitingindex enables O(1) lookup of affected partitions when upstreams complete or fail
Taint Handling:
When partition is tainted (via TaintCreateEvent):
- Find current canonical UUID for the ref
- Transition that partition instance to Tainted state (preserves history)
- Keep in
canonical_partitions- readers need to see it's tainted - Wants within TTL will see partition is tainted (not Live)
- Orchestrator will schedule new jobs for those wants
- New partition created with fresh UUID when next job starts
- New partition replaces tainted one in
canonical_partitions
Phase 5: Migration and Cleanup
Goals:
- Remove deprecated fields
- Update API responses
- Complete migration
Changes:
-
Remove deprecated fields from protobuf
building_partitionsfromJobRunBufferEventV1servicing_wantsfromJobRunBufferEventV1WantAttributedPartitionsmessage
-
Remove backward compatibility code
partitions: BTreeMap<String, Partition>fromBuildState- Dual writes/reads
-
Update API responses to include UUIDs where relevant
- JobRunDetail can include partition UUIDs built
- PartitionDetail can include UUID for debugging
-
Update tests to use UUID-based assertions
Design Decisions & Trade-offs
1. Wants Reference Refs, Not UUIDs
Decision: Wants always reference partition refs (e.g., "data/beta"), not UUIDs.
Rationale:
- User requests "data/beta" - the current/canonical partition for that ref
- Want state is based on canonical partition: "is the current partition for my ref being built?"
- If partition gets tainted/rebuilt, wants see the new canonical partition automatically
- Simpler mental model: want doesn't care about historical instances
How it works:
// Want creation
want.partitions = ["data/beta"] // ref, not UUID
// Want state determination
if let Some(canonical_uuid) = canonical_partitions.get("data/beta") {
let partition = partitions_by_uuid[canonical_uuid];
match partition.state {
Building => want.state = Building,
Live => want can complete,
...
}
} else {
// No canonical partition exists yet → Idle
}
2. Jobs Reference UUIDs, Not Refs
Decision: Jobs reference the specific partition UUIDs they built.
Rationale:
- Jobs build specific partition instances
- Historical record: "Job J built Partition(uuid-1)"
- Even if partition is later tainted/rebuilt, job's record is immutable
- Enables provenance: "Which job built this specific partition?"
How it works:
JobRunBufferEventV1 {
building_partition_uuids: [uuid-1, uuid-2] // Specific instances
}
3. UUID Generation: When?
Decision: Orchestrator generates UUIDs when queuing jobs, includes them in JobRunBufferEventV1.
Rationale:
- UUIDs represent specific build attempts, not partition refs
- Orchestrator is source of truth for "start building these partitions"
- Event contains UUIDs, making replay deterministic (same UUIDs in event)
- No UUID generation during event processing - UUIDs are in the event itself
Key insight: The orchestrator generates UUIDs (not BuildState during event handling). This makes UUIDs part of the immutable event log.
4. Canonical Partition: One at a Time
Decision: Only one canonical partition per ref at a time.
Scenario handling:
- Partition(uuid-1, "data/beta") is Building
- User requests rebuild → new want sees uuid-1 is Building → want becomes Building
- Want waits for uuid-1 to complete
- If uuid-1 completes successfully → want completes
- If uuid-1 fails or is tainted → new partition instance created (uuid-2), canonical updated
Alternative considered: Multiple concurrent builds with versioning
- Significantly more complex
- No existing need for this
5. Event Format: UUID as String
Decision: Store UUIDs as strings in protobuf events.
Rationale:
- Human-readable in logs/debugging
- Standard UUID string format (36 chars)
- Protobuf has no native UUID type
Trade-off: Larger event size (36 bytes vs 16 bytes) - acceptable for debuggability.
Testing Strategy
Unit Tests
-
Partition UUID uniqueness
- Creating partitions generates unique UUIDs
- Same ref at different times gets different UUIDs
-
Canonical partition tracking
- canonical_partitions always points to current instance
- Old instances remain in partitions_by_uuid
-
Want state determination
- Want checks canonical partition state
- Multiple wants see same canonical partition
Integration Tests
-
Multi-want scenario (reproduces original bug)
- Want 1 created → New → no partition exists → Idle
- Job scheduled → orchestrator generates uuid-1 → partition created Building
- Want 1 transitions Idle → Building (via job buffer event)
- Wants 2-4 created → New → partition Building (uuid-1) → Building
- All 4 wants reference same canonical partition uuid-1
- Job dep miss → all transition to UpstreamBuilding correctly
- Verifies New state transitions and state sensing work correctly
-
Rebuild scenario
- Partition built → Live (uuid-1)
- Partition tainted → new instance created (uuid-2), canonical updated
- New wants reference uuid-2
- Old partition uuid-1 still queryable for history
End-to-End Tests
- Full lifecycle
- Want created → canonical partition determined
- Job runs → partition transitions through states
- Want completes → partition remains in history
- Partition expires → new UUID for rebuild, canonical updated
Implementation FAQs
Q: Do we need to maintain backwards compatibility with existing events?
A: No. We can assume no need to maintain backwards compatibility or retain data produced before this change. This simplifies the implementation significantly - no need to handle old event formats or generate UUIDs for replayed pre-UUID events.
Q: How should we handle reference errors and index inconsistencies?
A: Panic on any reference issues with contextual information. This includes:
- Missing partition UUIDs in
partitions_by_uuid - Missing canonical pointers in
canonical_partitions - Inverted index inconsistencies (wants_for_partition, downstream_waiting)
- Invalid state transitions
Add assertions and validation throughout to catch these issues immediately rather than failing silently.
Q: What about cleanup of the wants_for_partition inverted index?
A: Don't remove wants from the index when they complete. This is acceptable for the initial implementation. Building of years of partitions for a mature data platform would still represent less than a million entries, which is manageable. We can add cleanup later if needed.
Q: What happens when an upstream partition is Tainted instead of becoming Live?
A: Tainting of an upstream means it is no longer live, and the downstream job should dep miss. The system will operate correctly:
- Downstream job discovers upstream is Tainted (not Live) → dep miss
- Derivative want created for tainted upstream
- Tainted upstream triggers rebuild (new UUID, replaces canonical)
- Derivative want succeeds → downstream can resume
Q: How should UUIDs be generated? Should the Orchestrator calculate them?
A: Use deterministic derivation instead of orchestrator generation:
fn derive_partition_uuid(job_run_id: &str, partition_ref: &str) -> Uuid {
// Hash job_run_id + partition_ref bytes
let mut hasher = Sha256::new();
hasher.update(job_run_id.as_bytes());
hasher.update(partition_ref.as_bytes());
let hash = hasher.finalize();
// Convert first 16 bytes to UUID
Uuid::from_slice(&hash[0..16]).unwrap()
}
Benefits:
- No orchestrator UUID state/generation needed
- Deterministic replay (same job + ref = same UUID)
- Event schema stays simple (job_run_id + partition refs)
- Build state derives UUIDs in
handle_job_run_buffer() - No need for
PartitionInstanceRefmessage in protobuf
Q: How do we enforce safe canonical partition access?
A: Add and use helper methods in BuildState to enforce correct access patterns:
get_canonical_partition(ref)- lookup canonical partition for a refget_canonical_partition_uuid(ref)- get UUID of canonical partitionget_partition_by_uuid(uuid)- direct UUID lookupget_wants_for_partition(ref)- query inverted index
Existing get_partition() function should be updated to use canonical lookup. Code should always access "current state" via canonical_partitions, not by ref lookup in the deprecated partitions map.
Q: What is the want schedulability check logic?
A: A want is schedulable if:
- The canonical partition doesn't exist for any of its partition refs, OR
- The canonical partition exists and is in Tainted or UpForRetry state
In other words: !exists || Tainted || UpForRetry
Building and UpstreamBuilding partitions act as leases (not schedulable).
Q: Should we implement phases strictly sequentially?
A: No. Proceed in the most efficient and productive manner possible. Phases can be combined or reordered as makes sense. For example, Phase 1 + Phase 2 can be done together since want state sensing depends on the new partition states.
Q: Should we write tests incrementally or implement everything first?
A: Implement tests as we go. Write unit tests for each component as it's implemented, then integration tests for full scenarios.
Q: Should wants reference partition UUIDs or partition refs?
A: Wants should NEVER reference partition instances (via UUID). Wants should ONLY reference canonical partitions via partition ref strings. This is already the case - wants include partition refs, which allows the orchestrator to resolve partition info for want state updates. The separation is:
- Wants → Partition Refs (canonical, user-facing)
- Jobs → Partition UUIDs (specific instances, historical)
Q: Should we add UpstreamFailed state for partitions?
A: Yes. This provides symmetry with want semantics and clear terminal state propagation:
Scenario:
- Partition A: Building → Failed (hard failure)
- Partition B needs A, dep misses → UpstreamBuilding
- Derivative want created for A, immediately fails (A is Failed)
- Partition B: UpstreamBuilding → UpstreamFailed
Benefits:
- Clear signal that partition can never succeed (upstreams failed)
- Mirrors Want UpstreamFailed semantics (consistency)
- Useful for UIs and debugging
- Prevents indefinite waiting in UpstreamBuilding state
Transition logic:
- When partition transitions to Failed, lookup
downstream_waiting[failed_partition_ref] - For each downstream partition UUID in UpstreamBuilding state, transition to UpstreamFailed
- This propagates failure information down the dependency chain
Add to Phase 1 partition states:
- UpstreamFailed: Partition failed because upstream dependencies failed (terminal state)
Add transition:
- UpstreamBuilding → UpstreamFailed (upstream dependency hard failure)
Q: Can a job build the same partition ref multiple times?
A: No, this is invalid. A job run cannot build the same partition multiple times. Each partition ref should appear at most once in a job's building_partitions list.
Summary
Adding partition UUIDs solves fundamental architectural problems:
- Temporal identity: Distinguish partition instances over time
- Stable job references: Jobs reference immutable partition UUIDs they built
- Wants reference refs: Want state based on canonical partition for their ref
- Discoverable relationships: Remove redundant snapshot data (WantAttributedPartitions)
- Proper semantics: Want state reflects actual canonical partition state
Key principle: Wants care about "what's the current state of data/beta?" (refs), while jobs and historical queries care about "what happened to this specific partition instance?" (UUIDs).
This refactor enables cleaner code, better observability, and proper event sourcing semantics throughout the system.