update partitions refactor plan

This commit is contained in:
Stuart Axelbrooke 2025-11-25 10:28:29 +08:00
parent dfc1d19237
commit 7ccec59364

View file

@ -208,6 +208,7 @@ Can answer:
- **UpForRetry**: Upstream dependencies satisfied, partition ready to retry building
- **Live**: Successfully built
- **Failed**: Hard failure (shouldn't retry)
- **UpstreamFailed**: Partition failed because upstream dependencies failed (terminal state)
- **Tainted**: Marked invalid by taint event
**Removed:** Missing state - partitions only exist when jobs start building them or are completed.
@ -215,6 +216,7 @@ Can answer:
Key transitions:
- Building → UpstreamBuilding (job reports dep miss)
- UpstreamBuilding → UpForRetry (all upstream deps satisfied)
- UpstreamBuilding → UpstreamFailed (upstream dependency hard failure)
- Building → Live (job succeeds)
- Building → Failed (job hard failure)
- UpForRetry → Building (new job queued for retry, creates fresh UUID)
@ -309,6 +311,7 @@ Can answer:
- Check canonical partition states for all partition refs
- Transition based on observation (in priority order):
- If ANY canonical partition is Failed → New → Failed (job can't be safely retried)
- If ANY canonical partition is UpstreamFailed → New → UpstreamFailed (upstream deps failed)
- If ALL canonical partitions exist AND are Live → New → Successful (already built!)
- If ANY canonical partition is Building → New → Building (being built now)
- If ANY canonical partition is UpstreamBuilding → New → UpstreamBuilding (waiting for deps)
@ -316,7 +319,7 @@ Can answer:
- Otherwise (partitions don't exist or other states) → New → Idle (need to schedule)
- For derivative wants, additional logic may transition to UpstreamBuilding
Key insight: Most wants will go New → Idle because partitions won't exist yet (only created when jobs start). Subsequent wants for already-building partitions go New → Building. Wants arriving during dep miss go New → UpstreamBuilding. Wants for partitions ready to retry go New → Idle. Wants for already-Live partitions go New → Successful. Wants for Failed partitions go New → Failed.
Key insight: Most wants will go New → Idle because partitions won't exist yet (only created when jobs start). Subsequent wants for already-building partitions go New → Building. Wants arriving during dep miss go New → UpstreamBuilding. Wants for partitions ready to retry go New → Idle. Wants for already-Live partitions go New → Successful. Wants for Failed or UpstreamFailed partitions go New → Failed/UpstreamFailed.
3. **Keep WantSchedulability building check**
@ -350,6 +353,7 @@ Can answer:
New transitions needed:
- **New → Failed:** Any partition failed
- **New → UpstreamFailed:** Any partition upstream failed
- **New → Successful:** All partitions live
- **New → Idle:** Normal case, partitions don't exist
- **New → Building:** Partitions already building when want created
@ -468,14 +472,22 @@ Complete flow when a job has dependency miss:
- Wants for partition: Building → UpstreamBuilding
- Wants track the derivative want IDs in their UpstreamBuildingState
4. **Upstream builds complete:**
- Derivative wants build upstream partitions → upstream partition becomes Live
- **Lookup downstream_waiting:** Get `downstream_waiting[upstream_partition_ref]` → list of UUIDs waiting for this upstream
- For each waiting partition UUID:
- Get partition from `partitions_by_uuid[uuid]`
- Check if ALL its MissingDeps are now satisfied (canonical partitions for all refs are Live)
- If satisfied: transition partition UpstreamBuilding → UpForRetry
- Remove uuid from `downstream_waiting` entries (cleanup)
4. **Upstream builds complete or fail:**
- **Success case:** Derivative wants build upstream partitions → upstream partition becomes Live
- **Lookup downstream_waiting:** Get `downstream_waiting[upstream_partition_ref]` → list of UUIDs waiting for this upstream
- For each waiting partition UUID:
- Get partition from `partitions_by_uuid[uuid]`
- Check if ALL its MissingDeps are now satisfied (canonical partitions for all refs are Live)
- If satisfied: transition partition UpstreamBuilding → UpForRetry
- Remove uuid from `downstream_waiting` entries (cleanup)
- **Failure case:** Upstream partition transitions to Failed (hard failure)
- **Lookup downstream_waiting:** Get `downstream_waiting[failed_partition_ref]` → list of UUIDs waiting for this upstream
- For each waiting partition UUID in UpstreamBuilding state:
- Transition partition: UpstreamBuilding → UpstreamFailed
- Transition associated wants: UpstreamBuilding → UpstreamFailed
- Remove uuid from `downstream_waiting` entries (cleanup)
- This propagates failure information down the dependency chain
5. **Want becomes schedulable:**
- When partition transitions to UpForRetry, wants transition: UpstreamBuilding → Idle
@ -493,7 +505,8 @@ Complete flow when a job has dependency miss:
- UpstreamBuilding also acts as lease (upstreams not ready, can't retry yet)
- UpForRetry releases lease (upstreams ready, safe to schedule)
- Failed releases lease but blocks new wants (hard failure, shouldn't retry)
- `downstream_waiting` index enables O(1) lookup of affected partitions when upstreams complete
- UpstreamFailed releases lease and blocks new wants (upstream deps failed, can't succeed)
- `downstream_waiting` index enables O(1) lookup of affected partitions when upstreams complete or fail
**Taint Handling:**
@ -657,6 +670,122 @@ JobRunBufferEventV1 {
- Want completes → partition remains in history
- Partition expires → new UUID for rebuild, canonical updated
## Implementation FAQs
### Q: Do we need to maintain backwards compatibility with existing events?
**A:** No. We can assume no need to maintain backwards compatibility or retain data produced before this change. This simplifies the implementation significantly - no need to handle old event formats or generate UUIDs for replayed pre-UUID events.
### Q: How should we handle reference errors and index inconsistencies?
**A:** Panic on any reference issues with contextual information. This includes:
- Missing partition UUIDs in `partitions_by_uuid`
- Missing canonical pointers in `canonical_partitions`
- Inverted index inconsistencies (wants_for_partition, downstream_waiting)
- Invalid state transitions
Add assertions and validation throughout to catch these issues immediately rather than failing silently.
### Q: What about cleanup of the `wants_for_partition` inverted index?
**A:** Don't remove wants from the index when they complete. This is acceptable for the initial implementation. Building of years of partitions for a mature data platform would still represent less than a million entries, which is manageable. We can add cleanup later if needed.
### Q: What happens when an upstream partition is Tainted instead of becoming Live?
**A:** Tainting of an upstream means it is no longer live, and the downstream job should dep miss. The system will operate correctly:
1. Downstream job discovers upstream is Tainted (not Live) → dep miss
2. Derivative want created for tainted upstream
3. Tainted upstream triggers rebuild (new UUID, replaces canonical)
4. Derivative want succeeds → downstream can resume
### Q: How should UUIDs be generated? Should the Orchestrator calculate them?
**A:** Use deterministic derivation instead of orchestrator generation:
```rust
fn derive_partition_uuid(job_run_id: &str, partition_ref: &str) -> Uuid {
// Hash job_run_id + partition_ref bytes
let mut hasher = Sha256::new();
hasher.update(job_run_id.as_bytes());
hasher.update(partition_ref.as_bytes());
let hash = hasher.finalize();
// Convert first 16 bytes to UUID
Uuid::from_slice(&hash[0..16]).unwrap()
}
```
**Benefits:**
- No orchestrator UUID state/generation needed
- Deterministic replay (same job + ref = same UUID)
- Event schema stays simple (job_run_id + partition refs)
- Build state derives UUIDs in `handle_job_run_buffer()`
- No need for `PartitionInstanceRef` message in protobuf
### Q: How do we enforce safe canonical partition access?
**A:** Add and use helper methods in BuildState to enforce correct access patterns:
- `get_canonical_partition(ref)` - lookup canonical partition for a ref
- `get_canonical_partition_uuid(ref)` - get UUID of canonical partition
- `get_partition_by_uuid(uuid)` - direct UUID lookup
- `get_wants_for_partition(ref)` - query inverted index
Existing `get_partition()` function should be updated to use canonical lookup. Code should always access "current state" via canonical_partitions, not by ref lookup in the deprecated partitions map.
### Q: What is the want schedulability check logic?
**A:** A want is schedulable if:
- The canonical partition doesn't exist for any of its partition refs, OR
- The canonical partition exists and is in Tainted or UpForRetry state
In other words: `!exists || Tainted || UpForRetry`
Building and UpstreamBuilding partitions act as leases (not schedulable).
### Q: Should we implement phases strictly sequentially?
**A:** No. Proceed in the most efficient and productive manner possible. Phases can be combined or reordered as makes sense. For example, Phase 1 + Phase 2 can be done together since want state sensing depends on the new partition states.
### Q: Should we write tests incrementally or implement everything first?
**A:** Implement tests as we go. Write unit tests for each component as it's implemented, then integration tests for full scenarios.
### Q: Should wants reference partition UUIDs or partition refs?
**A:** Wants should NEVER reference partition instances (via UUID). Wants should ONLY reference canonical partitions via partition ref strings. This is already the case - wants include partition refs, which allows the orchestrator to resolve partition info for want state updates. The separation is:
- Wants → Partition Refs (canonical, user-facing)
- Jobs → Partition UUIDs (specific instances, historical)
### Q: Should we add UpstreamFailed state for partitions?
**A:** Yes. This provides symmetry with want semantics and clear terminal state propagation:
**Scenario:**
1. Partition A: Building → Failed (hard failure)
2. Partition B needs A, dep misses → UpstreamBuilding
3. Derivative want created for A, immediately fails (A is Failed)
4. Partition B: UpstreamBuilding → UpstreamFailed
**Benefits:**
- Clear signal that partition can never succeed (upstreams failed)
- Mirrors Want UpstreamFailed semantics (consistency)
- Useful for UIs and debugging
- Prevents indefinite waiting in UpstreamBuilding state
**Transition logic:**
- When partition transitions to Failed, lookup `downstream_waiting[failed_partition_ref]`
- For each downstream partition UUID in UpstreamBuilding state, transition to UpstreamFailed
- This propagates failure information down the dependency chain
**Add to Phase 1 partition states:**
- **UpstreamFailed**: Partition failed because upstream dependencies failed (terminal state)
**Add transition:**
- UpstreamBuilding → UpstreamFailed (upstream dependency hard failure)
### Q: Can a job build the same partition ref multiple times?
**A:** No, this is invalid. A job run cannot build the same partition multiple times. Each partition ref should appear at most once in a job's building_partitions list.
## Summary
Adding partition UUIDs solves fundamental architectural problems: