databuild/docs/design/core-build.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

2.3 KiB

Core Build

Purpose: Enable continuous reconciliation of partition wants through distributed job execution.

Architecture

DataBuild uses a want-driven reconciliation model inspired by Kubernetes. Users declare wants (desired partitions), and the system continuously attempts to satisfy them through job execution.

Key Components

  • Wants: Declarations of desired partitions with TTLs and SLAs
  • Jobs: Stateless executables that transform input partitions to outputs
  • Graph: Reconciliation runtime that monitors wants and dispatches jobs
  • Build Event Log (BEL): Event-sourced ledger of all system activity

Reconciliation Loop

The graph continuously:

  1. Scans active wants from the BEL
  2. Groups wants by responsible job (via graph lookup)
  3. Dispatches jobs to build wanted partitions
  4. Handles job results:
  • Success: Marks partitions available
  • Missing Dependencies: Creates wants for missing deps with traceable ID
  • Failure: Potentially retry based on job retry strategy

Jobs

Jobs are stateless executables with a single exec entrypoint. When invoked with requested partitions as args, they either:

  • Successfully produce the partitions
  • Fail with missing dependency error listing required upstream partitions
  • Fail with other errors for potential retry

Jobs declare execution preferences (batching, concurrency) as metadata, but contain no orchestration logic.

Want Propagation

When jobs report missing dependencies, the graph:

  1. Parses the error for partition refs
  2. Creates child wants (linked via parent_want_id)
  3. Continues reconciliation with expanded want set

This creates want chains that naturally traverse the dependency graph without upfront planning.

Correctness Strategy

  • Idempotency: Jobs must produce identical outputs given same inputs
  • Atomicity: Partitions are either complete or absent
  • Want chains: Full traceability via parent/root want IDs
  • Event sourcing: All state changes recorded in BEL
  • Protobuf interface: All build actions fit structs and interfaces defined by databuild/databuild.proto

The system achieves correctness through convergence rather than planning—continuously reconciling until wants are satisfied or expired.