databuild/AGENTS.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

3.3 KiB

Agent Instructions

Project Overview

DataBuild is a bazel-based data build system. Key files:

  • DESIGN.md - Overall design of databuild
  • databuild.proto - System interfaces
  • Component designs - design docs for specific aspects or components of databuild:
    • Core build - How the core semantics of databuild works and are implemented
    • Build event log - How the build event log works and is accessed
    • Service - How the databuild HTTP service and web app are designed.
    • Glossary - Centralized description of key terms.
    • Graph specification - Describes the different libraries that enable more succinct declaration of databuild applications than the core bazel-based interface.
    • Deploy strategies - Different strategies for deploying databuild applications.
    • Wants - How triggering works in databuild applications.
    • Why databuild? - Why to choose databuild instead of other better established orchestration solutions.

Please reference these for any related work, as they indicate key technical bias/direction of the project.

Tenets

  • Declarative over imperative wherever possible/reasonable.
  • We are building for the future, and choose to do "the right thing" rather than taking shortcuts to get unstuck. If you get stuck, pause and ask for help/input.
  • Do not add "unknown" results when parses or matches fail - these should always throw.
  • Compile time correctness is a super-power, and investment in it speeds up flywheel for development and user value.
  • CLI/Service Interchangeability: Both the CLI and service must produce identical artifacts (BEL events, logs, metrics, outputs) in the same locations. Users should be able to build with one interface and query/inspect results from the other seamlessly. This principle applies to all DataBuild operations, not just builds.

Build & Test

# Build all databuild components
bazel build //...

# Run databuild unit tests
bazel test //...

# Run end-to-end tests (validates CLI vs Service consistency)
./run_e2e_tests.sh

# Do not try to `bazel test //examples/basic_graph/...`, as this will not work.

Project Structure

  • databuild/ - Core system (Rust/Proto)
  • examples/ - Example implementations
  • scripts/ - Build utilities

DataBuild Job Architecture

Job Target Structure

Each DataBuild job creates three Bazel targets:

  • job_name.exec - Execution target (calls binary with "exec" subcommand)
  • job_name - Main job target (pipes config output to exec input)

Graph Configuration

databuild_graph(
    name = "my_graph",
    jobs = [":job1", ":job2"],  # Reference base job targets
    lookup = ":job_lookup",     # Binary that routes partition refs to jobs
)

Job Lookup Pattern

def lookup_job_for_partition(partition_ref: str) -> str:
    if pattern.match(partition_ref):
        return "//:job_name"  # Return base job target
    raise ValueError(f"No job found for: {partition_ref}")

Notes / Tips

  • Rust dependencies are implemented via rules_rust, so new dependencies should be added in the MODULE.bazel file.