databuild/docs/plans/detail-lineage.md
2025-11-28 12:41:11 +08:00

8.5 KiB

Detail & Lineage Views

Vision

Provide rich, navigable views into databuild's execution history that answer operational questions:

  • "What work was done to fulfill this want?" - The full DAG of partitions built and jobs run
  • "Where did this data come from?" - Trace a partition's lineage back through its inputs
  • "What downstream data uses this?" - Understand impact before tainting or debugging staleness

Three Distinct Views

1. Want Fulfillment View

Shows the work tree rooted at a want: all partitions built, jobs run, and derivative wants spawned to fulfill it.

W-001 "data/gamma" [Successful]
│
├── data/gamma [Live, uuid:abc]
│   └── JR-789 [Succeeded]
│       ├── read: data/beta [Live, uuid:def]
│       └── read: data/alpha [Live, uuid:ghi]
│
└── derivative: W-002 "data/beta" [Successful]
    │   └── triggered by: JR-456 dep-miss
    │
    └── data/beta [Live, uuid:def]
        └── JR-456 [DepMiss → retry → Succeeded]
            └── read: data/alpha [Live, uuid:ghi]

Key insight: This shows specific partition instances (by UUID), not just refs. A want's fulfillment is a concrete snapshot of what was built.

2. Partition Lineage View

The data flow graph: partition ↔ job_run alternating. Navigable upstream (inputs) and downstream (consumers).

              UPSTREAM
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
[data/a]    [data/b]    [data/c]
    │            │            │
    └────────────┼────────────┘
                 ▼
            JR-xyz [Succeeded]
                 │
                 ▼
         ══════════════════
         ║  data/beta     ║  ← FOCUS
         ║  [Live]        ║
         ══════════════════
                 │
                 ▼
            JR-abc [Running]
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
[data/x]    [data/y]    [data/z]
                 │
              DOWNSTREAM

This view answers: "What data flows into/out of this partition?" Click to navigate.

3. JobRun Detail View

Not a graph - just the immediate context of a single job execution:

  • Scheduled for: Which want(s) triggered this job
  • Read: Input partitions (with UUIDs - the specific versions read)
  • Wrote: Output partitions (with UUIDs)
  • Status history: Queued → Running → Succeeded/Failed/DepMiss
  • If DepMiss: Which derivative wants were spawned

Data Requirements

Track read_deps on success

Currently only captured on dep-miss. Need to extend JobRunSuccessEventV1:

message JobRunSuccessEventV1 {
  string job_run_id = 1;
  repeated ReadDeps read_deps = 2;  // NEW
}

Inverted consumer index

To answer "what reads this partition", need:

partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>>  // input_uuid → (output_uuid, job_run_id)

Indexed by UUID (not ref) because partition refs get reused across rebuilds, but UUIDs are immutable per instance. This preserves historical lineage correctly.

Built from read_deps when processing JobRunSuccessEventV1.

Design Decisions

  1. Retries: List all job runs triggered by a want, collapsing retries in the UI (expandable)
  2. Lineage UUIDs: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs)
  3. High fan-out: Truncate to N items with "+X more" expansion
  4. Consumer index by UUID: Index consumers by partition UUID (not ref) since refs get reused across rebuilds but UUIDs are immutable per instance
  5. Job run as lineage source of truth: Partition details don't duplicate upstream info - they reference their builder job run, which holds the read_deps

API Response Pattern

Detail and list endpoints return a wrapper with the primary data plus a shared index of related entities:

message GetJobRunResponse {
  JobRunDetail data = 1;
  RelatedEntities index = 2;
}

message ListJobRunsResponse {
  repeated JobRunDetail data = 1;
  RelatedEntities index = 2;  // shared across all items
}

message RelatedEntities {
  map<string, PartitionDetail> partitions = 1;
  map<string, JobRunDetail> job_runs = 2;
  map<string, WantDetail> wants = 3;
}

Why this pattern:

  • No recursion - Detail types stay flat, don't embed each other
  • Deduplication - Each entity appears once in the index, even if referenced by multiple items in data
  • O(1) lookup - Templates access index.partitions["data/beta"] directly
  • Composable - Same pattern works for single-item and list endpoints

Implementation Plan

Phase 1: Data Model (Complete)

1.1 Extend JobRunSuccessEventV1

message JobRunSuccessEventV1 {
  string job_run_id = 1;
  repeated ReadDeps read_deps = 2;  // preserves impacted→read relationships
}

1.2 Extend SucceededState to store resolved UUIDs

pub struct SucceededState {
    pub succeeded_at: u64,
    pub read_deps: Vec<ReadDeps>,                        // from event
    pub read_partition_uuids: BTreeMap<String, Uuid>,    // ref → UUID at read time
    pub wrote_partition_uuids: BTreeMap<String, Uuid>,   // ref → UUID (from building_partitions)
}

UUIDs resolved by looking up canonical partitions when processing success event.

1.3 Add consumer index to BuildState

// input_uuid → list of (output_uuid, job_run_id)
partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>>

Populated from read_deps when processing JobRunSuccessEventV1. Uses UUIDs (not refs) to preserve historical lineage across partition rebuilds.

Phase 2: API Response Pattern (Complete)

2.1 RelatedEntities wrapper

Added RelatedEntities message and index field to all Get*/List* responses.

2.2 HasRelatedIds trait

Implemented trait for Want, JobRun, Partition that returns the IDs of related entities. Query layer uses this to build the index.

2.3 Query methods

Added *_with_index() methods that collect related IDs via the trait and resolve them to full entity details.

Phase 3: Job Integration (Complete)

Jobs already emit DATABUILD_DEP_READ_JSON and the full pipeline is wired up:

  1. Job execution (job_run.rs): SubProcessBackend::poll parses DATABUILD_DEP_READ_JSON lines from stdout and stores in SubProcessCompleted.read_deps
  2. Event creation (job_run.rs): to_event() creates JobRunSuccessEventV1 with read_deps
  3. Event handling (event_handlers.rs): handle_job_run_success() resolves read_partition_uuids and wrote_partition_uuids, populates partition_consumers index
  4. API serialization (job_run_state.rs): to_detail() includes read_deps, read_partition_uuids, wrote_partition_uuids in JobRunDetail

Phase 4: Frontend (Complete)

4.1 JobRun detail page

Added to job_runs/detail.html:

  • "Read Partitions" section showing partition refs with UUIDs (linked to partition detail)
  • "Wrote Partitions" section showing partition refs with UUIDs (linked to partition detail)
  • "Derivative Wants" section showing wants spawned by dep-miss (linked to want detail)

Extended JobRunDetailView with:

  • read_deps: Vec<ReadDepsView> - impacted→read dependency relationships
  • read_partitions: Vec<PartitionRefWithUuidView> - input partitions with UUIDs
  • wrote_partitions: Vec<PartitionRefWithUuidView> - output partitions with UUIDs
  • derivative_want_ids: Vec<String> - derivative wants from dep-miss

4.2 Partition detail page

Added to partitions/detail.html:

  • "Lineage - Built By" section showing the builder job run (linked to job run detail for upstream lineage)
  • "Lineage - Downstream Consumers" section showing UUIDs of downstream partitions

Extended PartitionDetailView with:

  • built_by_job_run_id: Option<String> - job run that built this partition
  • downstream_partition_uuids: Vec<String> - downstream consumers from index

4.3 Want detail page

Added to wants/detail.html:

  • "Fulfillment - Job Runs" section listing all job runs that serviced this want
  • "Fulfillment - Derivative Wants" section listing derivative wants spawned by dep-misses

Extended WantDetailView with:

  • job_run_ids: Vec<String> - all job runs that serviced this want
  • derivative_want_ids: Vec<String> - derivative wants spawned