databuild/databuild
2025-07-05 14:46:35 -07:00
..
graph Update podcast reviews example 2025-07-02 21:37:55 -07:00
job Implement single target strategy 2025-06-29 20:08:21 -07:00
runtime Implement single target strategy 2025-06-29 20:08:21 -07:00
BUILD.bazel Implement prost struct generation 2025-07-03 22:36:12 -07:00
databuild.proto Planning stage 1.1 2025-07-05 14:46:35 -07:00
databuild_test.rs Implement prost struct generation 2025-07-03 22:36:12 -07:00
prost_generator.rs Implement prost struct generation 2025-07-03 22:36:12 -07:00
README.md Prost simple generation 2025-07-03 21:20:56 -07:00
rules.bzl Clean up old paths 2025-06-29 20:14:33 -07:00
simple.proto Prost simple generation 2025-07-03 21:20:56 -07:00
simple_test.rs Implement prost struct generation 2025-07-03 22:36:12 -07:00
structs.rs Match proto definitions 2025-06-29 20:47:07 -07:00

DataBuild Protobuf Interfaces

This directory contains the protobuf interfaces for DataBuild, implemented as a hermetic Bazel-native solution.

Architecture

Hermetic Build Approach

Instead of relying on external Cargo dependencies or complex protoc toolchains, we use a hermetic Bazel genrule that generates Rust code directly from the protobuf specification. This ensures:

  • Full Hermeticity: No external dependencies beyond what's in the Bazel workspace
  • Consistency: Same generated code across all environments
  • Performance: Fast builds without complex dependency resolution
  • Simplicity: Pure Bazel solution that integrates seamlessly

Generated Code Structure

The build generates Rust structs that mirror the protobuf specification in databuild.proto:

// Core types
pub struct PartitionRef { pub str: String }
pub struct JobConfig { /* ... */ }
pub struct JobGraph { /* ... */ }
// ... and all other protobuf messages

Custom Serialization

Since we're hermetic, we implement our own JSON serialization instead of relying on serde:

let partition = PartitionRef::new("my-partition");
let json = partition.to_json(); // {"str":"my-partition"}
let parsed = PartitionRef::from_json(&json).unwrap();

Usage

In BUILD.bazel files:

rust_library(
    name = "my_service",
    deps = ["//databuild:databuild"],
    # ... 
)

In Rust code:

use databuild::*;

let partition = PartitionRef::new("my-partition");
let job_config = JobConfig {
    outputs: vec![partition],
    inputs: vec![],
    args: vec!["process".to_string()],
    env: HashMap::new(),
};

Build Targets

  • //databuild:databuild - Main library with generated protobuf types
  • //databuild:databuild_test - Tests for the generated code
  • //databuild:databuild_proto - The protobuf library definition
  • //databuild:structs - Legacy manually-written structs (deprecated)

Testing

bazel test //databuild:databuild_test
bazel build //databuild:databuild

Benefits of This Approach

  1. No External Dependencies: Eliminates prost, tonic-build, and complex protoc setups
  2. Bazel Native: Fully integrated with Bazel's dependency graph
  3. Fast Builds: No compilation of external crates or complex build scripts
  4. Hermetic: Same results every time, everywhere
  5. Maintainable: Simple genrule that's easy to understand and modify
  6. Extensible: Easy to add custom methods and serialization logic

Future Enhancements

  • Add wire-format serialization if needed
  • Generate service stubs for gRPC-like communication
  • Add validation methods for message types
  • Extend custom serialization to support more formats