# Detail & Lineage Views ## Vision Provide rich, navigable views into databuild's execution history that answer operational questions: - **"What work was done to fulfill this want?"** - The full DAG of partitions built and jobs run - **"Where did this data come from?"** - Trace a partition's lineage back through its inputs - **"What downstream data uses this?"** - Understand impact before tainting or debugging staleness ## Three Distinct Views ### 1. Want Fulfillment View Shows the work tree rooted at a want: all partitions built, jobs run, and derivative wants spawned to fulfill it. ``` W-001 "data/gamma" [Successful] │ ├── data/gamma [Live, uuid:abc] │ └── JR-789 [Succeeded] │ ├── read: data/beta [Live, uuid:def] │ └── read: data/alpha [Live, uuid:ghi] │ └── derivative: W-002 "data/beta" [Successful] │ └── triggered by: JR-456 dep-miss │ └── data/beta [Live, uuid:def] └── JR-456 [DepMiss → retry → Succeeded] └── read: data/alpha [Live, uuid:ghi] ``` Key insight: This shows **specific partition instances** (by UUID), not just refs. A want's fulfillment is a concrete snapshot of what was built. ### 2. Partition Lineage View The data flow graph: partition ↔ job_run alternating. Navigable upstream (inputs) and downstream (consumers). ``` UPSTREAM │ ┌────────────┼────────────┐ ▼ ▼ ▼ [data/a] [data/b] [data/c] │ │ │ └────────────┼────────────┘ ▼ JR-xyz [Succeeded] │ ▼ ══════════════════ ║ data/beta ║ ← FOCUS ║ [Live] ║ ══════════════════ │ ▼ JR-abc [Running] │ ┌────────────┼────────────┐ ▼ ▼ ▼ [data/x] [data/y] [data/z] │ DOWNSTREAM ``` This view answers: "What data flows into/out of this partition?" Click to navigate. ### 3. JobRun Detail View Not a graph - just the immediate context of a single job execution: - **Scheduled for**: Which want(s) triggered this job - **Read**: Input partitions (with UUIDs - the specific versions read) - **Wrote**: Output partitions (with UUIDs) - **Status history**: Queued → Running → Succeeded/Failed/DepMiss - **If DepMiss**: Which derivative wants were spawned ## Data Requirements ### Track read_deps on success Currently only captured on dep-miss. Need to extend `JobRunSuccessEventV1`: ```protobuf message JobRunSuccessEventV1 { string job_run_id = 1; repeated ReadDeps read_deps = 2; // NEW } ``` ### Inverted consumer index To answer "what reads this partition", need: ```rust partition_consumers: BTreeMap> // partition_ref → consumer partition_refs ``` Built from read_deps on job success. ## Design Decisions 1. **Retries**: List all job runs triggered by a want, collapsing retries in the UI (expandable) 2. **Lineage UUIDs**: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs) 3. **High fan-out**: Truncate to N items with "+X more" expansion ## Implementation Plan ### Phase 1: Data Model **1.1 Extend JobRunSuccessEventV1** ```protobuf message JobRunSuccessEventV1 { string job_run_id = 1; repeated ReadDeps read_deps = 2; // NEW: preserves impacted→read relationships } ``` **1.2 Extend SucceededState to store resolved UUIDs** ```rust pub struct SucceededState { pub succeeded_at: u64, pub read_deps: Vec, // from event pub read_partition_uuids: BTreeMap, // ref → UUID at read time pub wrote_partition_uuids: BTreeMap, // ref → UUID (from building_partitions) } ``` UUIDs resolved by looking up canonical partitions when processing success event. **1.3 Add consumer index to BuildState** ```rust // input_partition_ref → list of (output_partition_ref, job_run_id) partition_consumers: BTreeMap> ``` Populated from `read_deps` when processing JobRunSuccessEventV1. ### Phase 2: Extend Existing API Endpoints **2.1 GET /api/wants/:id** Add to response: - `job_runs`: All job runs servicing this want (with status, partitions built) - `derivative_wants`: Wants spawned by dep-miss from this want's jobs **2.2 GET /api/partitions/:ref** Add to response: - `built_by`: Job run that built this partition (with read_deps + resolved UUIDs) - `upstream`: Input partitions (refs + UUIDs) from builder's read_deps - `downstream`: Consumer partitions (refs + UUIDs) from consumer index **2.3 GET /api/job_runs/:id** Add to response: - `read_deps`: With resolved UUIDs for each partition - `wrote_partitions`: With UUIDs - `derivative_wants`: If DepMiss, the wants that were spawned ### Phase 3: Frontend **3.1 Want detail page** Add "Fulfillment" section: - List of job runs (retries collapsed, expandable) - Derivative wants as nested items - Partition UUIDs linked to partition detail **3.2 Partition detail page** Add "Lineage" section: - Upstream: builder job → input partitions (navigable) - Downstream: consumer jobs → output partitions (truncated at N) **3.3 JobRun detail page** Add: - "Read" section with partition refs + UUIDs - "Wrote" section with partition refs + UUIDs - "Derivative Wants" section (if DepMiss) ### Phase 4: Job Integration Extend `DATABUILD_DEP_READ_JSON` parsing to run on job success (not just dep-miss). Jobs already emit this; we just need to capture it. ## Sequencing 1. Proto + state changes 2. Event handler updates 3. API response extensions 4. Frontend enhancements