databuild/design/core-build.md

6.2 KiB

Core Build

Purpose: Centralize the build logic and semantics in a performant, correct core.

Architecture

  • Jobs depend on input partitions and produce output partitions.
  • Graphs compose jobs to fully plan and execute builds of requested partitions.
  • Both jobs and graphs emit events via the build event log to update build state.
  • A common interface is implemented to execute job and graph build actions, which different clients rely on (e.g. CLI, service, etc)
  • Jobs and graphs use wrappers to implement configuration and observability
  • Graph-based composition is the basis for databuild application deployment

Jobs

Jobs are the atomic unit of work in databuild, executed via a Rust-based wrapper that provides:

  • Structured logging and telemetry collection
  • Platform-agnostic execution across local, container, and cloud environments
  • Zero-network-dependency operation via log-based communication
  • Standardized error handling and exit code categorization

job.config

Purpose: Enable planning of execution graph. Executed in-process when possible for speed. For interface details, see PartitionRef and JobConfig in databuild.proto.

trait DataBuildJob {
  fn config(outputs: Vec<PartitionRef>) -> JobConfig;
}

job.config State Diagrams

flowchart TD
    begin((begin)) --> validate_args
    emit_job_config_fail --> fail((fail))
    validate_args -- fail --> emit_arg_validate_fail --> emit_job_config_fail
    validate_args -- success --> emit_arg_validate_success --> run_config
    run_config -- fail --> emit_config_fail --> emit_job_config_fail
    run_config -- success --> emit_config_success ---> success((success))

job.exec

Purpose: Execute job in exec wrapper.

trait DataBuildJob {
  fn exec(config: JobConfig) -> PartitionManifest;
}

job.exec State Diagram

flowchart TD
  begin((begin)) --> wrapper_validate_config
  emit_job_exec_fail --> fail((fail))
  wrapper_validate_config -- fail --> emit_config_validate_fail --> emit_job_exec_fail
  wrapper_validate_config -- success --> emit_config_validate_success --> wrapper_launch_task
  wrapper_launch_task -- fail --> emit_task_launch_fail --> emit_job_exec_fail
  wrapper_launch_task -- success --> emit_task_launch_success --> wrapper_monitor_task
  wrapper_monitor_task -- heartbeat timer --> emit_heartbeat --> wrapper_monitor_task
  wrapper_monitor_task -- job stderr --> emit_log_entry --> wrapper_monitor_task
  wrapper_monitor_task -- job stdout --> emit_log_entry --> wrapper_monitor_task
  wrapper_monitor_task -- non-zero exit --> emit_task_failed --> emit_job_exec_fail
  wrapper_monitor_task -- zero exit --> emit_task_success --> emit_partition_manifest
  emit_partition_manifest --> success((success))

Graphs

Graphs are the unit of composition. To analyze (plan) task graphs (see JobGraph), they iteratively walk back from the requested output partitions, invoking job.config until no unresolved partitions remain. To build partitions, the graph runs analyze then iteratively executes the resulting task graph.

graph.analyze

Purpose: produce a complete task graph to materialize a requested set of partitions.

trait DataBuildGraph {
  fn analyze(outputs: Vec<PartitionRef>) -> JobGraph;
}

graph.analyze State Diagram

flowchart TD
    begin((begin)) --> initialize_missing_partitions --> dispatch_missing_partitions
    emit_graph_analyze_fail --> fail((fail))
    dispatch_missing_partitions -- fail --> emit_partition_dispatch_fail --> emit_graph_analyze_fail
    dispatch_missing_partitions -- success --> cycle_detected?
    cycle_detected? -- yes --> emit_cycle_detected --> emit_graph_analyze_fail
    cycle_detected? -- no --> remaining_missing_partitions?
    remaining_missing_partitions? -- yes --> dispatch_missing_partitions
    remaining_missing_partitions? -- no --> emit_job_graph --> success((success))

graph.build

Purpose: analyze, then execute the resulting task graph.

trait DataBuildGraph {
  fn build(outputs: Vec<PartitionRef>);
}

graph.build State Diagram

flowchart TD
    begin((begin)) --> graph_analyze
    emit_graph_build_fail --> fail((fail))
    graph_analyze -- fail --> emit_graph_build_fail
    graph_analyze -- success --> initialize_ready_jobs --> remaining_ready_jobs?
    remaining_ready_jobs? -- yes --> emit_remaining_jobs --> schedule_jobs
    remaining_ready_jobs? -- none schedulable --> emit_jobs_unschedulable --> emit_graph_build_fail
    schedule_jobs -- fail --> emit_job_schedule_fail --> emit_graph_build_fail
    schedule_jobs -- success --> emit_job_schedule_success --> await_jobs
    await_jobs -- job_failure --> emit_job_failure --> emit_job_cancels --> cancel_running_jobs
    cancel_running_jobs --> emit_graph_build_fail
    await_jobs -- N seconds since heartbeat --> emit_heartbeat --> await_jobs
    await_jobs -- job_success --> remaining_ready_jobs?
    remaining_ready_jobs? -- no ---------> emit_graph_build_success --> success((success))

Correctness Strategy

  • Core component interfaces are described in databuild.proto, a protobuf interface shared by all core components and all GSLs.
  • GSLs implement ergonomic graph, job, and partition helpers that make coupling explicit
  • Graphs automatically detect and raise on non-unique job -> partition mappings
  • Graph and job processes are fully described by state diagrams, whose state transitions are logged to the build event log.

Partition Delegation

  • Sometimes a partition already exists, or another build request is already planning on producing a partition
  • A later build request with delegate to an already existing build request for said partition
  • The later build request will write an event to the build event log referencing the ID of the delegate, allowing traceability of visualization

Heartbeats / Health Checks

  • Which strategy do we use?
  • If we are launching tasks to a place we can't health check, how could they heartbeat?