|
|
||
|---|---|---|
| .forgejo/workflows | ||
| databuild | ||
| design | ||
| docs | ||
| examples/podcast_reviews | ||
| plans | ||
| scripts | ||
| tests/end_to_end | ||
| tools/build_rules | ||
| .bazelignore | ||
| .bazelrc | ||
| .bazelversion | ||
| .envrc | ||
| .gitignore | ||
| BUILD.bazel | ||
| CLAUDE.md | ||
| DESIGN.md | ||
| GEMINI.md | ||
| MODULE.bazel | ||
| MODULE.bazel.lock | ||
| README.md | ||
| requirements.in | ||
| requirements_lock.txt | ||
| run_e2e_tests.sh | ||
DataBuild
DataBuild is a trivially-deployable, partition-oriented, declarative data build system.
DataBuild is for teams at data-driven orgs who need reliable, flexible, and correct data pipelines and are tired of manually orchestrating complex dependency graphs. You define Jobs (that take input data partitions and produce output partitions), compose them into Graphs (partition dependency networks), and DataBuild handles the rest. Just ask it to build a partition, and databuild handles resolving the jobs that need to run, planning execution order, running builds concurrently, and tracking and exposing build progress. Instead of writing orchestration code that breaks when dependencies change, you focus on the data transformations while DataBuild ensures your pipelines are correct, observable, and reliable.
For important context, check out DESIGN.md, along with designs in design/. Also, check out databuild.proto for key system interfaces. Key features:
-
Declarative dependencies - Ask for data, get data. Define partition dependencies and DataBuild automatically plans what jobs to run and when.
-
Partition-first design - Build only what's needed. Late data arrivals and partial rebuilds work seamlessly with atomic data partitions.
-
Deploy anywhere - One binary, any platform. Bazel-based builds create hermetic applications that run locally, in containers, or in the cloud.
-
Concurrent by design - Multiple teams, zero conflicts. Event-sourced coordination enables parallel builds without stepping on each other.
Usage
See the podcast example BUILD file.
Development
Intellij
Run these to allow intellij to understand the rust source:
# Generate a Cargo.toml file so intellij can link rust src
python3 scripts/generate_cargo_toml.py
# Generate a gitignore'd rust file representing the protobuf interfaces
scripts/generate_proto_for_ide.sh
Compiling
bazel build //...
Bullet-proof compile-time correctness is essential for production reliability. Backend protobuf changes must cause predictable frontend compilation failures, preventing runtime errors. Our three-pronged approach ensures this:
-
Complete Type Chain: Proto → Rust → OpenAPI → TypeScript → Components
- Each step uses generated types, maintaining accuracy across the entire pipeline
- Breaking changes at any layer cause compilation failures in dependent layers
-
Consistent Data Transformation: Service boundary layer transforms API responses to dashboard types
- Canonical frontend interfaces isolated from backend implementation details
- Transformations handle protobuf nullability and normalize data shapes
- Components never directly access generated API types
-
Strict TypeScript Configuration: Enforces explicit null handling and prevents implicit
anytypesstrictNullCheckscatches undefined property access patternsnoImplicitAnysurfaces type safety gaps- Runtime type errors become compile-time failures
This system guarantees that backend interface changes are caught during TypeScript compilation, not in production.
Testing
DataBuild core testing:
bazel test //...
End to end testing:
./run_e2e_tests.sh