stuart/databuild

Fork 0

Stuart Axelbrooke f353660f97 update docs

2025-11-28 12:41:11 +08:00

8.5 KiB

Raw Blame History

Detail & Lineage Views

Vision

Provide rich, navigable views into databuild's execution history that answer operational questions:

"What work was done to fulfill this want?" - The full DAG of partitions built and jobs run
"Where did this data come from?" - Trace a partition's lineage back through its inputs
"What downstream data uses this?" - Understand impact before tainting or debugging staleness

Three Distinct Views

1. Want Fulfillment View

Shows the work tree rooted at a want: all partitions built, jobs run, and derivative wants spawned to fulfill it.

W-001 "data/gamma" [Successful]
│
├── data/gamma [Live, uuid:abc]
│   └── JR-789 [Succeeded]
│       ├── read: data/beta [Live, uuid:def]
│       └── read: data/alpha [Live, uuid:ghi]
│
└── derivative: W-002 "data/beta" [Successful]
    │   └── triggered by: JR-456 dep-miss
    │
    └── data/beta [Live, uuid:def]
        └── JR-456 [DepMiss → retry → Succeeded]
            └── read: data/alpha [Live, uuid:ghi]

Key insight: This shows specific partition instances (by UUID), not just refs. A want's fulfillment is a concrete snapshot of what was built.

2. Partition Lineage View

The data flow graph: partition ↔ job_run alternating. Navigable upstream (inputs) and downstream (consumers).

              UPSTREAM
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
[data/a]    [data/b]    [data/c]
    │            │            │
    └────────────┼────────────┘
                 ▼
            JR-xyz [Succeeded]
                 │
                 ▼
         ══════════════════
         ║  data/beta     ║  ← FOCUS
         ║  [Live]        ║
         ══════════════════
                 │
                 ▼
            JR-abc [Running]
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
[data/x]    [data/y]    [data/z]
                 │
              DOWNSTREAM

This view answers: "What data flows into/out of this partition?" Click to navigate.

3. JobRun Detail View

Not a graph - just the immediate context of a single job execution:

Scheduled for: Which want(s) triggered this job
Read: Input partitions (with UUIDs - the specific versions read)
Wrote: Output partitions (with UUIDs)
Status history: Queued → Running → Succeeded/Failed/DepMiss
If DepMiss: Which derivative wants were spawned

Data Requirements

Track read_deps on success

Currently only captured on dep-miss. Need to extend JobRunSuccessEventV1:

message JobRunSuccessEventV1 {
  string job_run_id = 1;
  repeated ReadDeps read_deps = 2;  // NEW
}

Inverted consumer index

To answer "what reads this partition", need:

partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>>  // input_uuid → (output_uuid, job_run_id)

Indexed by UUID (not ref) because partition refs get reused across rebuilds, but UUIDs are immutable per instance. This preserves historical lineage correctly.

Built from read_deps when processing JobRunSuccessEventV1.

Design Decisions

Retries: List all job runs triggered by a want, collapsing retries in the UI (expandable)
Lineage UUIDs: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs)
High fan-out: Truncate to N items with "+X more" expansion
Consumer index by UUID: Index consumers by partition UUID (not ref) since refs get reused across rebuilds but UUIDs are immutable per instance
Job run as lineage source of truth: Partition details don't duplicate upstream info - they reference their builder job run, which holds the read_deps

API Response Pattern

Detail and list endpoints return a wrapper with the primary data plus a shared index of related entities:

message GetJobRunResponse {
  JobRunDetail data = 1;
  RelatedEntities index = 2;
}

message ListJobRunsResponse {
  repeated JobRunDetail data = 1;
  RelatedEntities index = 2;  // shared across all items
}

message RelatedEntities {
  map<string, PartitionDetail> partitions = 1;
  map<string, JobRunDetail> job_runs = 2;
  map<string, WantDetail> wants = 3;
}

Why this pattern:

No recursion - Detail types stay flat, don't embed each other
Deduplication - Each entity appears once in the index, even if referenced by multiple items in data
O(1) lookup - Templates access index.partitions["data/beta"] directly
Composable - Same pattern works for single-item and list endpoints

Implementation Plan

✅ Phase 1: Data Model (Complete)

1.1 Extend JobRunSuccessEventV1

message JobRunSuccessEventV1 {
  string job_run_id = 1;
  repeated ReadDeps read_deps = 2;  // preserves impacted→read relationships
}

1.2 Extend SucceededState to store resolved UUIDs

pub struct SucceededState {
    pub succeeded_at: u64,
    pub read_deps: Vec<ReadDeps>,                        // from event
    pub read_partition_uuids: BTreeMap<String, Uuid>,    // ref → UUID at read time
    pub wrote_partition_uuids: BTreeMap<String, Uuid>,   // ref → UUID (from building_partitions)
}

UUIDs resolved by looking up canonical partitions when processing success event.

1.3 Add consumer index to BuildState

// input_uuid → list of (output_uuid, job_run_id)
partition_consumers: BTreeMap<Uuid, Vec<(Uuid, String)>>

Populated from read_deps when processing JobRunSuccessEventV1. Uses UUIDs (not refs) to preserve historical lineage across partition rebuilds.

✅ Phase 2: API Response Pattern (Complete)

2.1 RelatedEntities wrapper

Added RelatedEntities message and index field to all Get*/List* responses.

2.2 HasRelatedIds trait

Implemented trait for Want, JobRun, Partition that returns the IDs of related entities. Query layer uses this to build the index.

2.3 Query methods

Added *_with_index() methods that collect related IDs via the trait and resolve them to full entity details.

✅ Phase 3: Job Integration (Complete)

Jobs already emit DATABUILD_DEP_READ_JSON and the full pipeline is wired up:

Job execution (job_run.rs): SubProcessBackend::poll parses DATABUILD_DEP_READ_JSON lines from stdout and stores in SubProcessCompleted.read_deps
Event creation (job_run.rs): to_event() creates JobRunSuccessEventV1 with read_deps
Event handling (event_handlers.rs): handle_job_run_success() resolves read_partition_uuids and wrote_partition_uuids, populates partition_consumers index
API serialization (job_run_state.rs): to_detail() includes read_deps, read_partition_uuids, wrote_partition_uuids in JobRunDetail

✅ Phase 4: Frontend (Complete)

4.1 JobRun detail page

Added to job_runs/detail.html:

"Read Partitions" section showing partition refs with UUIDs (linked to partition detail)
"Wrote Partitions" section showing partition refs with UUIDs (linked to partition detail)
"Derivative Wants" section showing wants spawned by dep-miss (linked to want detail)

Extended JobRunDetailView with:

read_deps: Vec<ReadDepsView> - impacted→read dependency relationships
read_partitions: Vec<PartitionRefWithUuidView> - input partitions with UUIDs
wrote_partitions: Vec<PartitionRefWithUuidView> - output partitions with UUIDs
derivative_want_ids: Vec<String> - derivative wants from dep-miss

4.2 Partition detail page

Added to partitions/detail.html:

"Lineage - Built By" section showing the builder job run (linked to job run detail for upstream lineage)
"Lineage - Downstream Consumers" section showing UUIDs of downstream partitions

Extended PartitionDetailView with:

built_by_job_run_id: Option<String> - job run that built this partition
downstream_partition_uuids: Vec<String> - downstream consumers from index

4.3 Want detail page

Added to wants/detail.html:

"Fulfillment - Job Runs" section listing all job runs that serviced this want
"Fulfillment - Derivative Wants" section listing derivative wants spawned by dep-misses

Extended WantDetailView with:

job_run_ids: Vec<String> - all job runs that serviced this want
derivative_want_ids: Vec<String> - derivative wants spawned

8.5 KiB Raw Blame History