# Detail & Lineage Views ## Vision Provide rich, navigable views into databuild's execution history that answer operational questions: - **"What work was done to fulfill this want?"** - The full DAG of partitions built and jobs run - **"Where did this data come from?"** - Trace a partition's lineage back through its inputs - **"What downstream data uses this?"** - Understand impact before tainting or debugging staleness ## Three Distinct Views ### 1. Want Fulfillment View Shows the work tree rooted at a want: all partitions built, jobs run, and derivative wants spawned to fulfill it. ``` W-001 "data/gamma" [Successful] │ ├── data/gamma [Live, uuid:abc] │ └── JR-789 [Succeeded] │ ├── read: data/beta [Live, uuid:def] │ └── read: data/alpha [Live, uuid:ghi] │ └── derivative: W-002 "data/beta" [Successful] │ └── triggered by: JR-456 dep-miss │ └── data/beta [Live, uuid:def] └── JR-456 [DepMiss → retry → Succeeded] └── read: data/alpha [Live, uuid:ghi] ``` Key insight: This shows **specific partition instances** (by UUID), not just refs. A want's fulfillment is a concrete snapshot of what was built. ### 2. Partition Lineage View The data flow graph: partition ↔ job_run alternating. Navigable upstream (inputs) and downstream (consumers). ``` UPSTREAM │ ┌────────────┼────────────┐ ▼ ▼ ▼ [data/a] [data/b] [data/c] │ │ │ └────────────┼────────────┘ ▼ JR-xyz [Succeeded] │ ▼ ══════════════════ ║ data/beta ║ ← FOCUS ║ [Live] ║ ══════════════════ │ ▼ JR-abc [Running] │ ┌────────────┼────────────┐ ▼ ▼ ▼ [data/x] [data/y] [data/z] │ DOWNSTREAM ``` This view answers: "What data flows into/out of this partition?" Click to navigate. ### 3. JobRun Detail View Not a graph - just the immediate context of a single job execution: - **Scheduled for**: Which want(s) triggered this job - **Read**: Input partitions (with UUIDs - the specific versions read) - **Wrote**: Output partitions (with UUIDs) - **Status history**: Queued → Running → Succeeded/Failed/DepMiss - **If DepMiss**: Which derivative wants were spawned ## Data Requirements ### Track read_deps on success Currently only captured on dep-miss. Need to extend `JobRunSuccessEventV1`: ```protobuf message JobRunSuccessEventV1 { string job_run_id = 1; repeated ReadDeps read_deps = 2; // NEW } ``` ### Inverted consumer index To answer "what reads this partition", need: ```rust partition_consumers: BTreeMap> // input_uuid → (output_uuid, job_run_id) ``` Indexed by UUID (not ref) because partition refs get reused across rebuilds, but UUIDs are immutable per instance. This preserves historical lineage correctly. Built from read_deps when processing JobRunSuccessEventV1. ## Design Decisions 1. **Retries**: List all job runs triggered by a want, collapsing retries in the UI (expandable) 2. **Lineage UUIDs**: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs) 3. **High fan-out**: Truncate to N items with "+X more" expansion 4. **Consumer index by UUID**: Index consumers by partition UUID (not ref) since refs get reused across rebuilds but UUIDs are immutable per instance 5. **Job run as lineage source of truth**: Partition details don't duplicate upstream info - they reference their builder job run, which holds the read_deps ## API Response Pattern Detail and list endpoints return a wrapper with the primary data plus a shared index of related entities: ```protobuf message GetJobRunResponse { JobRunDetail data = 1; RelatedEntities index = 2; } message ListJobRunsResponse { repeated JobRunDetail data = 1; RelatedEntities index = 2; // shared across all items } message RelatedEntities { map partitions = 1; map job_runs = 2; map wants = 3; } ``` **Why this pattern:** - **No recursion** - Detail types stay flat, don't embed each other - **Deduplication** - Each entity appears once in the index, even if referenced by multiple items in `data` - **O(1) lookup** - Templates access `index.partitions["data/beta"]` directly - **Composable** - Same pattern works for single-item and list endpoints ## Implementation Plan ### ✅ Phase 1: Data Model (Complete) **1.1 Extend JobRunSuccessEventV1** ```protobuf message JobRunSuccessEventV1 { string job_run_id = 1; repeated ReadDeps read_deps = 2; // preserves impacted→read relationships } ``` **1.2 Extend SucceededState to store resolved UUIDs** ```rust pub struct SucceededState { pub succeeded_at: u64, pub read_deps: Vec, // from event pub read_partition_uuids: BTreeMap, // ref → UUID at read time pub wrote_partition_uuids: BTreeMap, // ref → UUID (from building_partitions) } ``` UUIDs resolved by looking up canonical partitions when processing success event. **1.3 Add consumer index to BuildState** ```rust // input_uuid → list of (output_uuid, job_run_id) partition_consumers: BTreeMap> ``` Populated from `read_deps` when processing JobRunSuccessEventV1. Uses UUIDs (not refs) to preserve historical lineage across partition rebuilds. ### ✅ Phase 2: API Response Pattern (Complete) **2.1 RelatedEntities wrapper** Added `RelatedEntities` message and `index` field to all Get*/List* responses. **2.2 HasRelatedIds trait** Implemented trait for Want, JobRun, Partition that returns the IDs of related entities. Query layer uses this to build the index. **2.3 Query methods** Added `*_with_index()` methods that collect related IDs via the trait and resolve them to full entity details. ### ✅ Phase 3: Job Integration (Complete) Jobs already emit `DATABUILD_DEP_READ_JSON` and the full pipeline is wired up: 1. **Job execution** (`job_run.rs`): `SubProcessBackend::poll` parses `DATABUILD_DEP_READ_JSON` lines from stdout and stores in `SubProcessCompleted.read_deps` 2. **Event creation** (`job_run.rs`): `to_event()` creates `JobRunSuccessEventV1` with `read_deps` 3. **Event handling** (`event_handlers.rs`): `handle_job_run_success()` resolves `read_partition_uuids` and `wrote_partition_uuids`, populates `partition_consumers` index 4. **API serialization** (`job_run_state.rs`): `to_detail()` includes `read_deps`, `read_partition_uuids`, `wrote_partition_uuids` in `JobRunDetail` ### ✅ Phase 4: Frontend (Complete) **4.1 JobRun detail page** Added to `job_runs/detail.html`: - "Read Partitions" section showing partition refs with UUIDs (linked to partition detail) - "Wrote Partitions" section showing partition refs with UUIDs (linked to partition detail) - "Derivative Wants" section showing wants spawned by dep-miss (linked to want detail) Extended `JobRunDetailView` with: - `read_deps: Vec` - impacted→read dependency relationships - `read_partitions: Vec` - input partitions with UUIDs - `wrote_partitions: Vec` - output partitions with UUIDs - `derivative_want_ids: Vec` - derivative wants from dep-miss **4.2 Partition detail page** Added to `partitions/detail.html`: - "Lineage - Built By" section showing the builder job run (linked to job run detail for upstream lineage) - "Lineage - Downstream Consumers" section showing UUIDs of downstream partitions Extended `PartitionDetailView` with: - `built_by_job_run_id: Option` - job run that built this partition - `downstream_partition_uuids: Vec` - downstream consumers from index **4.3 Want detail page** Added to `wants/detail.html`: - "Fulfillment - Job Runs" section listing all job runs that serviced this want - "Fulfillment - Derivative Wants" section listing derivative wants spawned by dep-misses Extended `WantDetailView` with: - `job_run_ids: Vec` - all job runs that serviced this want - `derivative_want_ids: Vec` - derivative wants spawned