databuild/databuild/README.md
Stuart Axelbrooke 1f76470ac4
Some checks are pending
/ setup (push) Waiting to run
Refactor databuild tests
2025-07-06 13:37:38 -07:00

88 lines
No EOL
2.6 KiB
Markdown

# DataBuild Protobuf Interfaces
This directory contains the protobuf interfaces for DataBuild, implemented as a hermetic Bazel-native solution.
## Architecture
### Hermetic Build Approach
Instead of relying on external Cargo dependencies or complex protoc toolchains, we use a **hermetic Bazel genrule** that generates Rust code directly from the protobuf specification. This ensures:
- **Full Hermeticity**: No external dependencies beyond what's in the Bazel workspace
- **Consistency**: Same generated code across all environments
- **Performance**: Fast builds without complex dependency resolution
- **Simplicity**: Pure Bazel solution that integrates seamlessly
### Generated Code Structure
The build generates Rust structs that mirror the protobuf specification in `databuild.proto`:
```rust
// Core types
pub struct PartitionRef { pub str: String }
pub struct JobConfig { /* ... */ }
pub struct JobGraph { /* ... */ }
// ... and all other protobuf messages
```
### Custom Serialization
Since we're hermetic, we implement our own JSON serialization instead of relying on serde:
```rust
let partition = PartitionRef::new("my-partition");
let json = partition.to_json(); // {"str":"my-partition"}
let parsed = PartitionRef::from_json(&json).unwrap();
```
## Usage
### In BUILD.bazel files:
```starlark
rust_library(
name = "my_service",
deps = ["//databuild:databuild"],
# ...
)
```
### In Rust code:
```rust
use databuild::*;
let partition = PartitionRef::new("my-partition");
let job_config = JobConfig {
outputs: vec![partition],
inputs: vec![],
args: vec!["process".to_string()],
env: HashMap::new(),
};
```
## Build Targets
- `//databuild:databuild` - Main library with generated protobuf types
- `//databuild:databuild_test` - Tests for the generated code
- `//databuild:databuild_proto` - The protobuf library definition
- `//databuild:structs` - Legacy manually-written structs (deprecated)
## Testing
```bash
bazel test //databuild:...
```
## Benefits of This Approach
1. **No External Dependencies**: Eliminates prost, tonic-build, and complex protoc setups
2. **Bazel Native**: Fully integrated with Bazel's dependency graph
3. **Fast Builds**: No compilation of external crates or complex build scripts
4. **Hermetic**: Same results every time, everywhere
5. **Maintainable**: Simple genrule that's easy to understand and modify
6. **Extensible**: Easy to add custom methods and serialization logic
## Future Enhancements
- Add wire-format serialization if needed
- Generate service stubs for gRPC-like communication
- Add validation methods for message types
- Extend custom serialization to support more formats