6.2 KiB
Core Build
Purpose: Centralize the build logic and semantics in a performant, correct core.
Architecture
- Jobs depend on input partitions and produce output partitions.
- Graphs compose jobs to fully plan and execute builds of requested partitions.
- Both jobs and graphs emit events via the build event log to update build state.
- A common interface is implemented to execute job and graph build actions, which different clients rely on (e.g. CLI, service, etc)
- Jobs and graphs use wrappers to implement configuration and observability
- Graph-based composition is the basis for databuild application deployment
Jobs
Jobs are the atomic unit of work in databuild, executed via a Rust-based wrapper that provides:
- Structured logging and telemetry collection
- Platform-agnostic execution across local, container, and cloud environments
- Zero-network-dependency operation via log-based communication
- Standardized error handling and exit code categorization
job.config
Purpose: Enable planning of execution graph. Executed in-process when possible for speed. For interface details, see
PartitionRef and JobConfig in
databuild.proto.
trait DataBuildJob {
fn config(outputs: Vec<PartitionRef>) -> JobConfig;
}
job.config State Diagrams
flowchart TD
begin((begin)) --> validate_args
emit_job_config_fail --> fail((fail))
validate_args -- fail --> emit_arg_validate_fail --> emit_job_config_fail
validate_args -- success --> emit_arg_validate_success --> run_config
run_config -- fail --> emit_config_fail --> emit_job_config_fail
run_config -- success --> emit_config_success ---> success((success))
job.exec
Purpose: Execute job in exec wrapper.
trait DataBuildJob {
fn exec(config: JobConfig) -> PartitionManifest;
}
job.exec State Diagram
flowchart TD
begin((begin)) --> wrapper_validate_config
emit_job_exec_fail --> fail((fail))
wrapper_validate_config -- fail --> emit_config_validate_fail --> emit_job_exec_fail
wrapper_validate_config -- success --> emit_config_validate_success --> wrapper_launch_task
wrapper_launch_task -- fail --> emit_task_launch_fail --> emit_job_exec_fail
wrapper_launch_task -- success --> emit_task_launch_success --> wrapper_monitor_task
wrapper_monitor_task -- heartbeat timer --> emit_heartbeat --> wrapper_monitor_task
wrapper_monitor_task -- job stderr --> emit_log_entry --> wrapper_monitor_task
wrapper_monitor_task -- job stdout --> emit_log_entry --> wrapper_monitor_task
wrapper_monitor_task -- non-zero exit --> emit_task_failed --> emit_job_exec_fail
wrapper_monitor_task -- zero exit --> emit_task_success --> emit_partition_manifest
emit_partition_manifest --> success((success))
Graphs
Graphs are the unit of composition. To analyze (plan) task graphs (see JobGraph), they
iteratively walk back from the requested output partitions, invoking job.config until no unresolved partitions
remain. To build partitions, the graph runs analyze then iteratively executes the resulting task graph.
graph.analyze
Purpose: produce a complete task graph to materialize a requested set of partitions.
trait DataBuildGraph {
fn analyze(outputs: Vec<PartitionRef>) -> JobGraph;
}
graph.analyze State Diagram
flowchart TD
begin((begin)) --> initialize_missing_partitions --> dispatch_missing_partitions
emit_graph_analyze_fail --> fail((fail))
dispatch_missing_partitions -- fail --> emit_partition_dispatch_fail --> emit_graph_analyze_fail
dispatch_missing_partitions -- success --> cycle_detected?
cycle_detected? -- yes --> emit_cycle_detected --> emit_graph_analyze_fail
cycle_detected? -- no --> remaining_missing_partitions?
remaining_missing_partitions? -- yes --> dispatch_missing_partitions
remaining_missing_partitions? -- no --> emit_job_graph --> success((success))
graph.build
Purpose: analyze, then execute the resulting task graph.
trait DataBuildGraph {
fn build(outputs: Vec<PartitionRef>);
}
graph.build State Diagram
flowchart TD
begin((begin)) --> graph_analyze
emit_graph_build_fail --> fail((fail))
graph_analyze -- fail --> emit_graph_build_fail
graph_analyze -- success --> initialize_ready_jobs --> remaining_ready_jobs?
remaining_ready_jobs? -- yes --> emit_remaining_jobs --> schedule_jobs
remaining_ready_jobs? -- none schedulable --> emit_jobs_unschedulable --> emit_graph_build_fail
schedule_jobs -- fail --> emit_job_schedule_fail --> emit_graph_build_fail
schedule_jobs -- success --> emit_job_schedule_success --> await_jobs
await_jobs -- job_failure --> emit_job_failure --> emit_job_cancels --> cancel_running_jobs
cancel_running_jobs --> emit_graph_build_fail
await_jobs -- N seconds since heartbeat --> emit_heartbeat --> await_jobs
await_jobs -- job_success --> remaining_ready_jobs?
remaining_ready_jobs? -- no ---------> emit_graph_build_success --> success((success))
Correctness Strategy
- Core component interfaces are described in
databuild.proto, a protobuf interface shared by all core components and all GSLs. - GSLs implement ergonomic graph, job, and partition helpers that make coupling explicit
- Graphs automatically detect and raise on non-unique job -> partition mappings
- Graph and job processes are fully described by state diagrams, whose state transitions are logged to the build event log.
Partition Delegation
- Sometimes a partition already exists, or another build request is already planning on producing a partition
- A later build request with delegate to an already existing build request for said partition
- The later build request will write an event to the build event log referencing the ID of the delegate, allowing traceability of visualization
Heartbeats / Health Checks
- Which strategy do we use?
- If we are launching tasks to a place we can't health check, how could they heartbeat?