databuild/databuild
2025-07-06 14:58:29 -07:00
..
graph phase 1 - fix exec events not being written 2025-07-06 14:58:29 -07:00
job Implement single target strategy 2025-06-29 20:08:21 -07:00
runtime Implement single target strategy 2025-06-29 20:08:21 -07:00
test Refactor databuild tests 2025-07-06 13:37:38 -07:00
BUILD.bazel phase 0 2025-07-06 14:41:26 -07:00
databuild.proto phase 0 2025-07-06 14:41:26 -07:00
prost_generator.rs Implement prost struct generation 2025-07-03 22:36:12 -07:00
README.md Refactor databuild tests 2025-07-06 13:37:38 -07:00
rules.bzl phase 1 - fix exec events not being written 2025-07-06 14:58:29 -07:00

DataBuild Protobuf Interfaces

This directory contains the protobuf interfaces for DataBuild, implemented as a hermetic Bazel-native solution.

Architecture

Hermetic Build Approach

Instead of relying on external Cargo dependencies or complex protoc toolchains, we use a hermetic Bazel genrule that generates Rust code directly from the protobuf specification. This ensures:

  • Full Hermeticity: No external dependencies beyond what's in the Bazel workspace
  • Consistency: Same generated code across all environments
  • Performance: Fast builds without complex dependency resolution
  • Simplicity: Pure Bazel solution that integrates seamlessly

Generated Code Structure

The build generates Rust structs that mirror the protobuf specification in databuild.proto:

// Core types
pub struct PartitionRef { pub str: String }
pub struct JobConfig { /* ... */ }
pub struct JobGraph { /* ... */ }
// ... and all other protobuf messages

Custom Serialization

Since we're hermetic, we implement our own JSON serialization instead of relying on serde:

let partition = PartitionRef::new("my-partition");
let json = partition.to_json(); // {"str":"my-partition"}
let parsed = PartitionRef::from_json(&json).unwrap();

Usage

In BUILD.bazel files:

rust_library(
    name = "my_service",
    deps = ["//databuild:databuild"],
    # ... 
)

In Rust code:

use databuild::*;

let partition = PartitionRef::new("my-partition");
let job_config = JobConfig {
    outputs: vec![partition],
    inputs: vec![],
    args: vec!["process".to_string()],
    env: HashMap::new(),
};

Build Targets

  • //databuild:databuild - Main library with generated protobuf types
  • //databuild:databuild_test - Tests for the generated code
  • //databuild:databuild_proto - The protobuf library definition
  • //databuild:structs - Legacy manually-written structs (deprecated)

Testing

bazel test //databuild:...

Benefits of This Approach

  1. No External Dependencies: Eliminates prost, tonic-build, and complex protoc setups
  2. Bazel Native: Fully integrated with Bazel's dependency graph
  3. Fast Builds: No compilation of external crates or complex build scripts
  4. Hermetic: Same results every time, everywhere
  5. Maintainable: Simple genrule that's easy to understand and modify
  6. Extensible: Easy to add custom methods and serialization logic

Future Enhancements

  • Add wire-format serialization if needed
  • Generate service stubs for gRPC-like communication
  • Add validation methods for message types
  • Extend custom serialization to support more formats