5.9 KiB
Detail & Lineage Views
Vision
Provide rich, navigable views into databuild's execution history that answer operational questions:
- "What work was done to fulfill this want?" - The full DAG of partitions built and jobs run
- "Where did this data come from?" - Trace a partition's lineage back through its inputs
- "What downstream data uses this?" - Understand impact before tainting or debugging staleness
Three Distinct Views
1. Want Fulfillment View
Shows the work tree rooted at a want: all partitions built, jobs run, and derivative wants spawned to fulfill it.
W-001 "data/gamma" [Successful]
│
├── data/gamma [Live, uuid:abc]
│ └── JR-789 [Succeeded]
│ ├── read: data/beta [Live, uuid:def]
│ └── read: data/alpha [Live, uuid:ghi]
│
└── derivative: W-002 "data/beta" [Successful]
│ └── triggered by: JR-456 dep-miss
│
└── data/beta [Live, uuid:def]
└── JR-456 [DepMiss → retry → Succeeded]
└── read: data/alpha [Live, uuid:ghi]
Key insight: This shows specific partition instances (by UUID), not just refs. A want's fulfillment is a concrete snapshot of what was built.
2. Partition Lineage View
The data flow graph: partition ↔ job_run alternating. Navigable upstream (inputs) and downstream (consumers).
UPSTREAM
│
┌────────────┼────────────┐
▼ ▼ ▼
[data/a] [data/b] [data/c]
│ │ │
└────────────┼────────────┘
▼
JR-xyz [Succeeded]
│
▼
══════════════════
║ data/beta ║ ← FOCUS
║ [Live] ║
══════════════════
│
▼
JR-abc [Running]
│
┌────────────┼────────────┐
▼ ▼ ▼
[data/x] [data/y] [data/z]
│
DOWNSTREAM
This view answers: "What data flows into/out of this partition?" Click to navigate.
3. JobRun Detail View
Not a graph - just the immediate context of a single job execution:
- Scheduled for: Which want(s) triggered this job
- Read: Input partitions (with UUIDs - the specific versions read)
- Wrote: Output partitions (with UUIDs)
- Status history: Queued → Running → Succeeded/Failed/DepMiss
- If DepMiss: Which derivative wants were spawned
Data Requirements
Track read_deps on success
Currently only captured on dep-miss. Need to extend JobRunSuccessEventV1:
message JobRunSuccessEventV1 {
string job_run_id = 1;
repeated ReadDeps read_deps = 2; // NEW
}
Inverted consumer index
To answer "what reads this partition", need:
partition_consumers: BTreeMap<String, Vec<String>> // partition_ref → consumer partition_refs
Built from read_deps on job success.
Design Decisions
- Retries: List all job runs triggered by a want, collapsing retries in the UI (expandable)
- Lineage UUIDs: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs)
- High fan-out: Truncate to N items with "+X more" expansion
Implementation Plan
Phase 1: Data Model
1.1 Extend JobRunSuccessEventV1
message JobRunSuccessEventV1 {
string job_run_id = 1;
repeated ReadDeps read_deps = 2; // NEW: preserves impacted→read relationships
}
1.2 Extend SucceededState to store resolved UUIDs
pub struct SucceededState {
pub succeeded_at: u64,
pub read_deps: Vec<ReadDeps>, // from event
pub read_partition_uuids: BTreeMap<String, Uuid>, // ref → UUID at read time
pub wrote_partition_uuids: BTreeMap<String, Uuid>, // ref → UUID (from building_partitions)
}
UUIDs resolved by looking up canonical partitions when processing success event.
1.3 Add consumer index to BuildState
// input_partition_ref → list of (output_partition_ref, job_run_id)
partition_consumers: BTreeMap<String, Vec<(String, String)>>
Populated from read_deps when processing JobRunSuccessEventV1.
Phase 2: Extend Existing API Endpoints
2.1 GET /api/wants/:id
Add to response:
job_runs: All job runs servicing this want (with status, partitions built)derivative_wants: Wants spawned by dep-miss from this want's jobs
2.2 GET /api/partitions/:ref
Add to response:
built_by: Job run that built this partition (with read_deps + resolved UUIDs)upstream: Input partitions (refs + UUIDs) from builder's read_depsdownstream: Consumer partitions (refs + UUIDs) from consumer index
2.3 GET /api/job_runs/:id
Add to response:
read_deps: With resolved UUIDs for each partitionwrote_partitions: With UUIDsderivative_wants: If DepMiss, the wants that were spawned
Phase 3: Frontend
3.1 Want detail page
Add "Fulfillment" section:
- List of job runs (retries collapsed, expandable)
- Derivative wants as nested items
- Partition UUIDs linked to partition detail
3.2 Partition detail page
Add "Lineage" section:
- Upstream: builder job → input partitions (navigable)
- Downstream: consumer jobs → output partitions (truncated at N)
3.3 JobRun detail page
Add:
- "Read" section with partition refs + UUIDs
- "Wrote" section with partition refs + UUIDs
- "Derivative Wants" section (if DepMiss)
Phase 4: Job Integration
Extend DATABUILD_DEP_READ_JSON parsing to run on job success (not just dep-miss). Jobs already emit this; we just need to capture it.
Sequencing
- Proto + state changes
- Event handler updates
- API response extensions
- Frontend enhancements