3.3 KiB
3.3 KiB
Agent Instructions
Project Overview
DataBuild is a bazel-based data build system. Key files:
DESIGN.md- Overall design of databuilddatabuild.proto- System interfaces- Component designs - design docs for specific aspects or components of databuild:
- Core build - How the core semantics of databuild works and are implemented
- Build event log - How the build event log works and is accessed
- Service - How the databuild HTTP service and web app are designed.
- Glossary - Centralized description of key terms.
- Graph specification - Describes the different libraries that enable more succinct declaration of databuild applications than the core bazel-based interface.
- Deploy strategies - Different strategies for deploying databuild applications.
- Wants - How triggering works in databuild applications.
- Why databuild? - Why to choose databuild instead of other better established orchestration solutions.
Please reference these for any related work, as they indicate key technical bias/direction of the project.
Tenets
- Declarative over imperative wherever possible/reasonable.
- We are building for the future, and choose to do "the right thing" rather than taking shortcuts to get unstuck. If you get stuck, pause and ask for help/input.
- Do not add "unknown" results when parses or matches fail - these should always throw.
- Compile time correctness is a super-power, and investment in it speeds up flywheel for development and user value.
- CLI/Service Interchangeability: Both the CLI and service must produce identical artifacts (BEL events, logs, metrics, outputs) in the same locations. Users should be able to build with one interface and query/inspect results from the other seamlessly. This principle applies to all DataBuild operations, not just builds.
Build & Test
# Build all databuild components
bazel build //...
# Run databuild unit tests
bazel test //...
# Run end-to-end tests (validates CLI vs Service consistency)
./run_e2e_tests.sh
# Do not try to `bazel test //examples/basic_graph/...`, as this will not work.
Project Structure
databuild/- Core system (Rust/Proto)examples/- Example implementationsscripts/- Build utilities
DataBuild Job Architecture
Job Target Structure
Each DataBuild job creates three Bazel targets:
job_name.exec- Execution target (calls binary with "exec" subcommand)job_name- Main job target (pipes config output to exec input)
Graph Configuration
databuild_graph(
name = "my_graph",
jobs = [":job1", ":job2"], # Reference base job targets
lookup = ":job_lookup", # Binary that routes partition refs to jobs
)
Job Lookup Pattern
def lookup_job_for_partition(partition_ref: str) -> str:
if pattern.match(partition_ref):
return "//:job_name" # Return base job target
raise ValueError(f"No job found for: {partition_ref}")
Notes / Tips
- Rust dependencies are implemented via rules_rust, so new dependencies should be added in the
MODULE.bazelfile.