databuild/examples/multihop/README.md

85 lines
2.2 KiB
Markdown

# Multi-Hop Dependency Example
This example demonstrates DataBuild's ability to handle multi-hop dependencies between jobs.
## Overview
The example consists of two jobs:
- **job_alpha**: Produces the `data/alpha` partition
- **job_beta**: Depends on `data/alpha` and produces `data/beta`
When you request `data/beta`:
1. Beta job runs and detects missing `data/alpha` dependency
2. Orchestrator creates a want for `data/alpha`
3. Alpha job runs and produces `data/alpha`
4. Beta job runs again and succeeds, producing `data/beta`
## Running the Example
From the repository root:
```bash
# Build the CLI
bazel build //databuild:databuild_cli
# Clean up any previous state
rm -f /tmp/databuild_multihop*.db /tmp/databuild_multihop_alpha_complete
# Start the server with the multihop configuration
./bazel-bin/databuild/databuild_cli serve \
--port 3050 \
--database /tmp/databuild_multihop.db \
--config examples/multihop/config.json
```
In another terminal, create a want for `data/beta`:
```bash
# Create a want for data/beta (which will trigger the dependency chain)
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
want data/beta
# Watch the wants
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
wants list
# Watch the job runs
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
job-runs list
# Watch the partitions
./bazel-bin/databuild/databuild_cli --server http://localhost:3050 \
partitions list
```
## Expected Behavior
1. Initial want for `data/beta` is created
2. Beta job runs, detects missing `data/alpha`, reports dependency miss
3. Orchestrator creates derivative want for `data/alpha`
4. Alpha job runs and succeeds
5. Beta job runs again and succeeds
6. Both partitions are now in `Live` state
## Configuration Format
The example uses JSON format (`config.json`), but TOML is also supported. Here's the equivalent TOML:
```toml
[[jobs]]
label = "//examples/multihop:job_alpha"
entrypoint = "./examples/multihop/job_alpha.sh"
partition_patterns = ["data/alpha"]
[jobs.environment]
JOB_NAME = "alpha"
[[jobs]]
label = "//examples/multihop:job_beta"
entrypoint = "./examples/multihop/job_beta.sh"
partition_patterns = ["data/beta"]
[jobs.environment]
JOB_NAME = "beta"
```