141 lines
6.2 KiB
Markdown
141 lines
6.2 KiB
Markdown
|
|
# Core Build
|
|
Purpose: Centralize the build logic and semantics in a performant, correct core.
|
|
|
|
## Architecture
|
|
- Jobs depend on input partitions and produce output partitions.
|
|
- Graphs compose jobs to fully plan and execute builds of requested partitions.
|
|
- Both jobs and graphs emit events via the [build event log](./build-event-log.md) to update build state.
|
|
- A common interface is implemented to execute job and graph build actions, which different clients rely on (e.g. CLI,
|
|
service, etc)
|
|
- Jobs and graphs use wrappers to implement configuration and [observability](./observability.md)
|
|
- Graph-based composition is the basis for databuild application [deployment](./deploy-strategies.md)
|
|
|
|
## Jobs
|
|
Jobs are the atomic unit of work in databuild, executed via a Rust-based wrapper that provides:
|
|
- Structured logging and telemetry collection
|
|
- Platform-agnostic execution across local, container, and cloud environments
|
|
- Zero-network-dependency operation via log-based communication
|
|
- Standardized error handling and exit code categorization
|
|
|
|
### `job.config`
|
|
Purpose: Enable planning of execution graph. Executed in-process when possible for speed. For interface details, see
|
|
[`PartitionRef`](./glossary.md#partitionref) and [`JobConfig`](./glossary.md#jobconfig) in
|
|
[`databuild.proto`](../databuild/databuild.proto).
|
|
|
|
```rust
|
|
trait DataBuildJob {
|
|
fn config(outputs: Vec<PartitionRef>) -> JobConfig;
|
|
}
|
|
```
|
|
|
|
#### `job.config` State Diagrams
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
begin((begin)) --> validate_args
|
|
emit_job_config_fail --> fail((fail))
|
|
validate_args -- fail --> emit_arg_validate_fail --> emit_job_config_fail
|
|
validate_args -- success --> emit_arg_validate_success --> run_config
|
|
run_config -- fail --> emit_config_fail --> emit_job_config_fail
|
|
run_config -- success --> emit_config_success ---> success((success))
|
|
```
|
|
|
|
### `job.exec`
|
|
Purpose: Execute job in exec wrapper.
|
|
|
|
```rust
|
|
trait DataBuildJob {
|
|
fn exec(config: JobConfig) -> PartitionManifest;
|
|
}
|
|
```
|
|
|
|
#### `job.exec` State Diagram
|
|
```mermaid
|
|
flowchart TD
|
|
begin((begin)) --> wrapper_validate_config
|
|
emit_job_exec_fail --> fail((fail))
|
|
wrapper_validate_config -- fail --> emit_config_validate_fail --> emit_job_exec_fail
|
|
wrapper_validate_config -- success --> emit_config_validate_success --> wrapper_launch_task
|
|
wrapper_launch_task -- fail --> emit_task_launch_fail --> emit_job_exec_fail
|
|
wrapper_launch_task -- success --> emit_task_launch_success --> wrapper_monitor_task
|
|
wrapper_monitor_task -- heartbeat timer --> emit_heartbeat --> wrapper_monitor_task
|
|
wrapper_monitor_task -- job stderr --> emit_log_entry --> wrapper_monitor_task
|
|
wrapper_monitor_task -- job stdout --> emit_log_entry --> wrapper_monitor_task
|
|
wrapper_monitor_task -- non-zero exit --> emit_task_failed --> emit_job_exec_fail
|
|
wrapper_monitor_task -- zero exit --> emit_task_success --> emit_partition_manifest
|
|
emit_partition_manifest --> success((success))
|
|
```
|
|
|
|
## Graphs
|
|
Graphs are the unit of composition. To `analyze` (plan) task graphs (see [`JobGraph`](./glossary.md#jobgraph)), they
|
|
iteratively walk back from the requested output partitions, invoking `job.config` until no unresolved partitions
|
|
remain. To `build` partitions, the graph runs `analyze` then iteratively executes the resulting task graph.
|
|
|
|
### `graph.analyze`
|
|
Purpose: produce a complete task graph to materialize a requested set of partitions.
|
|
|
|
```rust
|
|
trait DataBuildGraph {
|
|
fn analyze(outputs: Vec<PartitionRef>) -> JobGraph;
|
|
}
|
|
```
|
|
|
|
#### `graph.analyze` State Diagram
|
|
```mermaid
|
|
flowchart TD
|
|
begin((begin)) --> initialize_missing_partitions --> dispatch_missing_partitions
|
|
emit_graph_analyze_fail --> fail((fail))
|
|
dispatch_missing_partitions -- fail --> emit_partition_dispatch_fail --> emit_graph_analyze_fail
|
|
dispatch_missing_partitions -- success --> cycle_detected?
|
|
cycle_detected? -- yes --> emit_cycle_detected --> emit_graph_analyze_fail
|
|
cycle_detected? -- no --> remaining_missing_partitions?
|
|
remaining_missing_partitions? -- yes --> dispatch_missing_partitions
|
|
remaining_missing_partitions? -- no --> emit_job_graph --> success((success))
|
|
```
|
|
|
|
### `graph.build`
|
|
Purpose: analyze, then execute the resulting task graph.
|
|
|
|
```rust
|
|
trait DataBuildGraph {
|
|
fn build(outputs: Vec<PartitionRef>);
|
|
}
|
|
```
|
|
|
|
#### `graph.build` State Diagram
|
|
```mermaid
|
|
flowchart TD
|
|
begin((begin)) --> graph_analyze
|
|
emit_graph_build_fail --> fail((fail))
|
|
graph_analyze -- fail --> emit_graph_build_fail
|
|
graph_analyze -- success --> initialize_ready_jobs --> remaining_ready_jobs?
|
|
remaining_ready_jobs? -- yes --> emit_remaining_jobs --> schedule_jobs
|
|
remaining_ready_jobs? -- none schedulable --> emit_jobs_unschedulable --> emit_graph_build_fail
|
|
schedule_jobs -- fail --> emit_job_schedule_fail --> emit_graph_build_fail
|
|
schedule_jobs -- success --> emit_job_schedule_success --> await_jobs
|
|
await_jobs -- job_failure --> emit_job_failure --> emit_job_cancels --> cancel_running_jobs
|
|
cancel_running_jobs --> emit_graph_build_fail
|
|
await_jobs -- N seconds since heartbeat --> emit_heartbeat --> await_jobs
|
|
await_jobs -- job_success --> remaining_ready_jobs?
|
|
remaining_ready_jobs? -- no ---------> emit_graph_build_success --> success((success))
|
|
```
|
|
|
|
## Correctness Strategy
|
|
- Core component interfaces are described in [`databuild.proto`](../databuild/databuild.proto), a protobuf interface
|
|
shared by all core components and all [GSLs](./graph-specification.md).
|
|
- [GSLs](./graph-specification.md) implement ergonomic graph, job, and partition helpers that make coupling explicit
|
|
- Graphs automatically detect and raise on non-unique job -> partition mappings
|
|
- Graph and job processes are fully described by state diagrams, whose state transitions are logged to the
|
|
[build event log](./build-event-log.md).
|
|
|
|
## Partition Delegation
|
|
- Sometimes a partition already exists, or another build request is already planning on producing a partition
|
|
- A later build request with delegate to an already existing build request for said partition
|
|
- The later build request will write an event to the [build event log](./build-event-log.md) referencing the ID
|
|
of the delegate, allowing traceability of visualization
|
|
|
|
## Heartbeats / Health Checks
|
|
- Which strategy do we use?
|
|
- If we are launching tasks to a place we can't health check, how could they heartbeat?
|
|
|