From f353660f971eea1885b365eafba5e22a191f8846 Mon Sep 17 00:00:00 2001 From: Stuart Axelbrooke Date: Fri, 28 Nov 2025 12:41:11 +0800 Subject: [PATCH] update docs --- docs/narrative/what-llms-dont-do.md | 7 ++ docs/plans/detail-lineage.md | 123 ++++++++++++++++++---------- 2 files changed, 86 insertions(+), 44 deletions(-) create mode 100644 docs/narrative/what-llms-dont-do.md diff --git a/docs/narrative/what-llms-dont-do.md b/docs/narrative/what-llms-dont-do.md new file mode 100644 index 0000000..9912f34 --- /dev/null +++ b/docs/narrative/what-llms-dont-do.md @@ -0,0 +1,7 @@ + +# What LLMs Don't Do + +- Create and cultivate technical strategy + - Don't have a specific vision of the organizing formalization of the problem + the technical solution +- Adhere to technical strategy + - Please please please just read the relevant docs! diff --git a/docs/plans/detail-lineage.md b/docs/plans/detail-lineage.md index 9dc0b7c..aada9be 100644 --- a/docs/plans/detail-lineage.md +++ b/docs/plans/detail-lineage.md @@ -93,27 +93,59 @@ message JobRunSuccessEventV1 { To answer "what reads this partition", need: ```rust -partition_consumers: BTreeMap> // partition_ref → consumer partition_refs +partition_consumers: BTreeMap> // input_uuid → (output_uuid, job_run_id) ``` -Built from read_deps on job success. +Indexed by UUID (not ref) because partition refs get reused across rebuilds, but UUIDs are immutable per instance. This preserves historical lineage correctly. + +Built from read_deps when processing JobRunSuccessEventV1. ## Design Decisions 1. **Retries**: List all job runs triggered by a want, collapsing retries in the UI (expandable) 2. **Lineage UUIDs**: Resolve partition refs to canonical UUIDs at job success time (jobs don't need to know about UUIDs) 3. **High fan-out**: Truncate to N items with "+X more" expansion +4. **Consumer index by UUID**: Index consumers by partition UUID (not ref) since refs get reused across rebuilds but UUIDs are immutable per instance +5. **Job run as lineage source of truth**: Partition details don't duplicate upstream info - they reference their builder job run, which holds the read_deps + +## API Response Pattern + +Detail and list endpoints return a wrapper with the primary data plus a shared index of related entities: + +```protobuf +message GetJobRunResponse { + JobRunDetail data = 1; + RelatedEntities index = 2; +} + +message ListJobRunsResponse { + repeated JobRunDetail data = 1; + RelatedEntities index = 2; // shared across all items +} + +message RelatedEntities { + map partitions = 1; + map job_runs = 2; + map wants = 3; +} +``` + +**Why this pattern:** +- **No recursion** - Detail types stay flat, don't embed each other +- **Deduplication** - Each entity appears once in the index, even if referenced by multiple items in `data` +- **O(1) lookup** - Templates access `index.partitions["data/beta"]` directly +- **Composable** - Same pattern works for single-item and list endpoints ## Implementation Plan -### Phase 1: Data Model +### ✅ Phase 1: Data Model (Complete) **1.1 Extend JobRunSuccessEventV1** ```protobuf message JobRunSuccessEventV1 { string job_run_id = 1; - repeated ReadDeps read_deps = 2; // NEW: preserves impacted→read relationships + repeated ReadDeps read_deps = 2; // preserves impacted→read relationships } ``` @@ -133,63 +165,66 @@ UUIDs resolved by looking up canonical partitions when processing success event. **1.3 Add consumer index to BuildState** ```rust -// input_partition_ref → list of (output_partition_ref, job_run_id) -partition_consumers: BTreeMap> +// input_uuid → list of (output_uuid, job_run_id) +partition_consumers: BTreeMap> ``` -Populated from `read_deps` when processing JobRunSuccessEventV1. +Populated from `read_deps` when processing JobRunSuccessEventV1. Uses UUIDs (not refs) to preserve historical lineage across partition rebuilds. -### Phase 2: Extend Existing API Endpoints +### ✅ Phase 2: API Response Pattern (Complete) -**2.1 GET /api/wants/:id** +**2.1 RelatedEntities wrapper** -Add to response: -- `job_runs`: All job runs servicing this want (with status, partitions built) -- `derivative_wants`: Wants spawned by dep-miss from this want's jobs +Added `RelatedEntities` message and `index` field to all Get*/List* responses. -**2.2 GET /api/partitions/:ref** +**2.2 HasRelatedIds trait** -Add to response: -- `built_by`: Job run that built this partition (with read_deps + resolved UUIDs) -- `upstream`: Input partitions (refs + UUIDs) from builder's read_deps -- `downstream`: Consumer partitions (refs + UUIDs) from consumer index +Implemented trait for Want, JobRun, Partition that returns the IDs of related entities. Query layer uses this to build the index. -**2.3 GET /api/job_runs/:id** +**2.3 Query methods** -Add to response: -- `read_deps`: With resolved UUIDs for each partition -- `wrote_partitions`: With UUIDs -- `derivative_wants`: If DepMiss, the wants that were spawned +Added `*_with_index()` methods that collect related IDs via the trait and resolve them to full entity details. -### Phase 3: Frontend +### ✅ Phase 3: Job Integration (Complete) -**3.1 Want detail page** +Jobs already emit `DATABUILD_DEP_READ_JSON` and the full pipeline is wired up: -Add "Fulfillment" section: -- List of job runs (retries collapsed, expandable) -- Derivative wants as nested items -- Partition UUIDs linked to partition detail +1. **Job execution** (`job_run.rs`): `SubProcessBackend::poll` parses `DATABUILD_DEP_READ_JSON` lines from stdout and stores in `SubProcessCompleted.read_deps` +2. **Event creation** (`job_run.rs`): `to_event()` creates `JobRunSuccessEventV1` with `read_deps` +3. **Event handling** (`event_handlers.rs`): `handle_job_run_success()` resolves `read_partition_uuids` and `wrote_partition_uuids`, populates `partition_consumers` index +4. **API serialization** (`job_run_state.rs`): `to_detail()` includes `read_deps`, `read_partition_uuids`, `wrote_partition_uuids` in `JobRunDetail` -**3.2 Partition detail page** +### ✅ Phase 4: Frontend (Complete) -Add "Lineage" section: -- Upstream: builder job → input partitions (navigable) -- Downstream: consumer jobs → output partitions (truncated at N) +**4.1 JobRun detail page** -**3.3 JobRun detail page** +Added to `job_runs/detail.html`: +- "Read Partitions" section showing partition refs with UUIDs (linked to partition detail) +- "Wrote Partitions" section showing partition refs with UUIDs (linked to partition detail) +- "Derivative Wants" section showing wants spawned by dep-miss (linked to want detail) -Add: -- "Read" section with partition refs + UUIDs -- "Wrote" section with partition refs + UUIDs -- "Derivative Wants" section (if DepMiss) +Extended `JobRunDetailView` with: +- `read_deps: Vec` - impacted→read dependency relationships +- `read_partitions: Vec` - input partitions with UUIDs +- `wrote_partitions: Vec` - output partitions with UUIDs +- `derivative_want_ids: Vec` - derivative wants from dep-miss -### Phase 4: Job Integration +**4.2 Partition detail page** -Extend `DATABUILD_DEP_READ_JSON` parsing to run on job success (not just dep-miss). Jobs already emit this; we just need to capture it. +Added to `partitions/detail.html`: +- "Lineage - Built By" section showing the builder job run (linked to job run detail for upstream lineage) +- "Lineage - Downstream Consumers" section showing UUIDs of downstream partitions -## Sequencing +Extended `PartitionDetailView` with: +- `built_by_job_run_id: Option` - job run that built this partition +- `downstream_partition_uuids: Vec` - downstream consumers from index -1. Proto + state changes -2. Event handler updates -3. API response extensions -4. Frontend enhancements +**4.3 Want detail page** + +Added to `wants/detail.html`: +- "Fulfillment - Job Runs" section listing all job runs that serviced this want +- "Fulfillment - Derivative Wants" section listing derivative wants spawned by dep-misses + +Extended `WantDetailView` with: +- `job_run_ids: Vec` - all job runs that serviced this want +- `derivative_want_ids: Vec` - derivative wants spawned