5.9 KiB
Core Build
Purpose: Centralize the build logic and semantics in a performant, correct core.
Architecture
- Jobs depend on input partitions and produce output partitions.
- Graphs compose jobs to fully plan and execute builds of requested partitions.
- Both jobs and graphs emit events via the build event log to update build state.
- A common interface is implemented to execute job and graph build actions, which different clients rely on (e.g. CLI, service, etc)
- Jobs and graphs use wrappers to implement configuration and observability
- Graph-based composition is the basis for databuild application deployment
Jobs
Jobs are the atomic unit of work in databuild.
- Job wrapper fulfills configuration, observability, and record keeping
job.config
Purpose: Enable planning of execution graph. Executed in-process when possible for speed. For interface details, see
PartitionRef and JobConfig in
databuild.proto.
trait DataBuildJob {
fn config(outputs: Vec<PartitionRef>) -> JobConfig;
}
job.config State Diagrams
flowchart TD
begin((begin)) --> validate_args
emit_job_config_fail --> fail((fail))
validate_args -- fail --> emit_arg_validate_fail --> emit_job_config_fail
validate_args -- success --> emit_arg_validate_success --> run_config
run_config -- fail --> emit_config_fail --> emit_job_config_fail
run_config -- success --> emit_config_success ---> success((success))
job.exec
Purpose: Execute job in exec wrapper.
trait DataBuildJob {
fn exec(config: JobConfig) -> PartitionManifest;
}
job.exec State Diagram
flowchart TD
begin((begin)) --> validate_config
emit_job_exec_fail --> fail((fail))
validate_config -- fail --> emit_config_validate_fail --> emit_job_exec_fail
validate_config -- success --> emit_config_validate_success --> launch_task
launch_task -- fail --> emit_task_launch_fail --> emit_job_exec_fail
launch_task -- success --> emit_task_launch_success --> await_task
await_task -- waited N seconds --> emit_heartbeat --> await_task
await_task -- non-zero exit code --> emit_task_failed --> emit_job_exec_fail
await_task -- zero exit code --> emit_task_success --> calculate_metadata
calculate_metadata -- fail --> emit_metadata_calculation_fail --> emit_job_exec_fail
calculate_metadata -- success --> emit_metadata ---> success((success))
Graphs
Graphs are the unit of composition. To analyze (plan) task graphs (see JobGraph), they
iteratively walk back from the requested output partitions, invoking job.config until no unresolved partitions
remain. To build partitions, the graph runs analyze then iteratively executes the resulting task graph.
graph.analyze
Purpose: produce a complete task graph to materialize a requested set of partitions.
trait DataBuildGraph {
fn analyze(outputs: Vec<PartitionRef>) -> JobGraph;
}
graph.analyze State Diagram
flowchart TD
begin((begin)) --> initialize_missing_partitions --> dispatch_missing_partitions
emit_graph_analyze_fail --> fail((fail))
dispatch_missing_partitions -- fail --> emit_partition_dispatch_fail --> emit_graph_analyze_fail
dispatch_missing_partitions -- success --> cycle_detected?
cycle_detected? -- yes --> emit_cycle_detected --> emit_graph_analyze_fail
cycle_detected? -- no --> remaining_missing_partitions?
remaining_missing_partitions? -- yes --> dispatch_missing_partitions
remaining_missing_partitions? -- no --> emit_job_graph --> success((success))
graph.build
Purpose: analyze, then execute the resulting task graph.
trait DataBuildGraph {
fn build(outputs: Vec<PartitionRef>);
}
graph.build State Diagram
flowchart TD
begin((begin)) --> graph_analyze
emit_graph_build_fail --> fail((fail))
graph_analyze -- fail --> emit_graph_build_fail
graph_analyze -- success --> initialize_ready_jobs --> remaining_ready_jobs?
remaining_ready_jobs? -- yes --> emit_remaining_jobs --> schedule_jobs
remaining_ready_jobs? -- none schedulable --> emit_jobs_unschedulable --> emit_graph_build_fail
schedule_jobs -- fail --> emit_job_schedule_fail --> emit_graph_build_fail
schedule_jobs -- success --> emit_job_schedule_success --> await_jobs
await_jobs -- job_failure --> emit_job_failure --> emit_job_cancels --> cancel_running_jobs
cancel_running_jobs --> emit_graph_build_fail
await_jobs -- N seconds since heartbeat --> emit_heartbeat --> await_jobs
await_jobs -- job_success --> remaining_ready_jobs?
remaining_ready_jobs? -- no ---------> emit_graph_build_success --> success((success))
Correctness Strategy
- Core component interfaces are described in
databuild.proto, a protobuf interface shared by all core components and all GSLs. - GSLs implement ergonomic graph, job, and partition helpers that make coupling explicit
- Graphs automatically detect and raise on non-unique job -> partition mappings
- Graph and job processes are fully described by state diagrams, whose state transitions are logged to the build event log.
Partition Delegation
- Sometimes a partition already exists, or another build request is already planning on producing a partition
- A later build request with delegate to an already existing build request for said partition
- The later build request will write an event to the build event log referencing the ID of the delegate, allowing traceability of visualization
Heartbeats / Health Checks
- Which strategy do we use?
- If we are launching tasks to a place we can't health check, how could they heartbeat?