8.5 KiB
Detail & Lineage Views
Vision
Provide rich, navigable views into databuild's execution history that answer operational questions:
- "What work was done to fulfill this want?" - The full DAG of partitions built and jobs run
- "Where did this data come from?" - Trace a partition's lineage back through its inputs
- "What downstream data uses this?" - Understand impact before tainting or debugging staleness
Three Distinct Views
1. Want Fulfillment View
Shows the work tree rooted at a want: all partitions built, jobs run, and derivative wants spawned to fulfill it.
W-001 "data/gamma" [Successful]
│
├── data/gamma [Live, uuid:abc]
│ └── JR-789 [Succeeded]
│ ├── read: data/beta [Live, uuid:def]
│ └── read: data/alpha [Live, uuid:ghi]
│
└── derivative: W-002 "data/beta" [Successful]
│ └── triggered by: JR-456 dep-miss
│
└── data/beta [Live, uuid:def]
└── JR-456 [DepMiss → retry → Succeeded]
└── read: data/alpha [Live, uuid:ghi]
Key insight: This shows specific partition instances (by UUID), not just refs. A want's fulfillment is a concrete snapshot of what was built.
2. Partition Lineage View
The data flow graph: partition ↔ job_run alternating. Navigable upstream (inputs) and downstream (consumers).
UPSTREAM
│
┌────────────┼────────────┐
▼ ▼ ▼
[data/a] [data/b] [data/c]
│ │ │
└────────────┼────────────┘
▼
JR-xyz [Succeeded]
│
▼
══════════════════
║ data/beta ║ ← FOCUS
║ [Live] ║
══════════════════
│
▼
JR-abc [Running]
│
┌────────────┼────────────┐
▼ ▼ ▼
[data/x] [data/y] [data/z]
│
DOWNSTREAM
This view answers: "What data flows into/out of this partition?" Click to navigate.
3. JobRun Detail View
Not a graph - just the immediate context of a single job execution:
- Scheduled for: Which want(s) triggered this job
- Read: Input partitions (with UUIDs - the specific versions read)
- Wrote: Output partitions (with UUIDs)
- Status history: Queued → Running → Succeeded/Failed/DepMiss
- If DepMiss: Which derivative wants were spawned
Data Requirements
Track read_deps on success
Currently only captured on dep-miss. Need to extend JobRunSuccessEventV1:
message JobRunSuccessEventV1 {
string job_run_id = 1;
repeated ReadDeps read_deps = 2; // NEW
}
Inverted consumer index
To answer "what reads this partition", need:
partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>> // input_uuid → (output_uuid, job_run_id)
Indexed by UUID (not ref) because partition refs get reused across rebuilds, but UUIDs are immutable per instance. This preserves historical lineage correctly.
Built from read_deps when processing JobRunSuccessEventV1.
Design Decisions
- Retries: List all job runs triggered by a want, collapsing retries in the UI (expandable)
- Lineage UUIDs: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs)
- High fan-out: Truncate to N items with "+X more" expansion
- Consumer index by UUID: Index consumers by partition UUID (not ref) since refs get reused across rebuilds but UUIDs are immutable per instance
- Job run as lineage source of truth: Partition details don't duplicate upstream info - they reference their builder job run, which holds the read_deps
API Response Pattern
Detail and list endpoints return a wrapper with the primary data plus a shared index of related entities:
message GetJobRunResponse {
JobRunDetail data = 1;
RelatedEntities index = 2;
}
message ListJobRunsResponse {
repeated JobRunDetail data = 1;
RelatedEntities index = 2; // shared across all items
}
message RelatedEntities {
map<string, PartitionDetail> partitions = 1;
map<string, JobRunDetail> job_runs = 2;
map<string, WantDetail> wants = 3;
}
Why this pattern:
- No recursion - Detail types stay flat, don't embed each other
- Deduplication - Each entity appears once in the index, even if referenced by multiple items in
data - O(1) lookup - Templates access
index.partitions["data/beta"]directly - Composable - Same pattern works for single-item and list endpoints
Implementation Plan
✅ Phase 1: Data Model (Complete)
1.1 Extend JobRunSuccessEventV1
message JobRunSuccessEventV1 {
string job_run_id = 1;
repeated ReadDeps read_deps = 2; // preserves impacted→read relationships
}
1.2 Extend SucceededState to store resolved UUIDs
pub struct SucceededState {
pub succeeded_at: u64,
pub read_deps: Vec<ReadDeps>, // from event
pub read_partition_uuids: BTreeMap<String, Uuid>, // ref → UUID at read time
pub wrote_partition_uuids: BTreeMap<String, Uuid>, // ref → UUID (from building_partitions)
}
UUIDs resolved by looking up canonical partitions when processing success event.
1.3 Add consumer index to BuildState
// input_uuid → list of (output_uuid, job_run_id)
partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>>
Populated from read_deps when processing JobRunSuccessEventV1. Uses UUIDs (not refs) to preserve historical lineage across partition rebuilds.
✅ Phase 2: API Response Pattern (Complete)
2.1 RelatedEntities wrapper
Added RelatedEntities message and index field to all Get*/List* responses.
2.2 HasRelatedIds trait
Implemented trait for Want, JobRun, Partition that returns the IDs of related entities. Query layer uses this to build the index.
2.3 Query methods
Added *_with_index() methods that collect related IDs via the trait and resolve them to full entity details.
✅ Phase 3: Job Integration (Complete)
Jobs already emit DATABUILD_DEP_READ_JSON and the full pipeline is wired up:
- Job execution (
job_run.rs):SubProcessBackend::pollparsesDATABUILD_DEP_READ_JSONlines from stdout and stores inSubProcessCompleted.read_deps - Event creation (
job_run.rs):to_event()createsJobRunSuccessEventV1withread_deps - Event handling (
event_handlers.rs):handle_job_run_success()resolvesread_partition_uuidsandwrote_partition_uuids, populatespartition_consumersindex - API serialization (
job_run_state.rs):to_detail()includesread_deps,read_partition_uuids,wrote_partition_uuidsinJobRunDetail
✅ Phase 4: Frontend (Complete)
4.1 JobRun detail page
Added to job_runs/detail.html:
- "Read Partitions" section showing partition refs with UUIDs (linked to partition detail)
- "Wrote Partitions" section showing partition refs with UUIDs (linked to partition detail)
- "Derivative Wants" section showing wants spawned by dep-miss (linked to want detail)
Extended JobRunDetailView with:
read_deps: Vec<ReadDepsView>- impacted→read dependency relationshipsread_partitions: Vec<PartitionRefWithUuidView>- input partitions with UUIDswrote_partitions: Vec<PartitionRefWithUuidView>- output partitions with UUIDsderivative_want_ids: Vec<String>- derivative wants from dep-miss
4.2 Partition detail page
Added to partitions/detail.html:
- "Lineage - Built By" section showing the builder job run (linked to job run detail for upstream lineage)
- "Lineage - Downstream Consumers" section showing UUIDs of downstream partitions
Extended PartitionDetailView with:
built_by_job_run_id: Option<String>- job run that built this partitiondownstream_partition_uuids: Vec<String>- downstream consumers from index
4.3 Want detail page
Added to wants/detail.html:
- "Fulfillment - Job Runs" section listing all job runs that serviced this want
- "Fulfillment - Derivative Wants" section listing derivative wants spawned by dep-misses
Extended WantDetailView with:
job_run_ids: Vec<String>- all job runs that serviced this wantderivative_want_ids: Vec<String>- derivative wants spawned