|
|
||
|---|---|---|
| .. | ||
| databuild.json | ||
| job_alpha.sh | ||
| job_beta.sh | ||
| README.md | ||
Multi-Hop Dependency Example
This example demonstrates DataBuild's ability to handle multi-hop dependencies between jobs.
Overview
The example consists of two jobs:
- job_alpha: Produces the
data/alphapartition - job_beta: Depends on
data/alphaand producesdata/beta
When you request data/beta:
- Beta job runs and detects missing
data/alphadependency - Orchestrator creates a want for
data/alpha - Alpha job runs and produces
data/alpha - Beta job runs again and succeeds, producing
data/beta
Running the Example
From the repository root:
# Build the CLI
bazel build //databuild:databuild_cli
# Clean up any previous state
rm -f /tmp/databuild_multihop*.db /tmp/databuild_multihop_alpha_complete
# Start the server with the multihop configuration
./bazel-bin/databuild/databuild_cli serve \
--port 3050 \
--database /tmp/databuild_multihop.db \
--config examples/multihop/config.json
In another terminal, create a want for data/beta:
# Create a want for data/beta (which will trigger the dependency chain)
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
want data/beta
# Watch the wants
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
wants list
# Watch the job runs
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
job-runs list
# Watch the partitions
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
partitions list
Expected Behavior
- Initial want for
data/betais created - Beta job runs, detects missing
data/alpha, reports dependency miss - Orchestrator creates derivative want for
data/alpha - Alpha job runs and succeeds
- Beta job runs again and succeeds
- Both partitions are now in
Livestate
Configuration Format
The example uses JSON format (config.json), but TOML is also supported. Here's the equivalent TOML:
[[jobs]]
label = "//examples/multihop:job_alpha"
entrypoint = "./examples/multihop/job_alpha.sh"
partition_patterns = ["data/alpha"]
[jobs.environment]
JOB_NAME = "alpha"
[[jobs]]
label = "//examples/multihop:job_beta"
entrypoint = "./examples/multihop/job_beta.sh"
partition_patterns = ["data/beta"]
[jobs.environment]
JOB_NAME = "beta"