12 KiB
CLI-Service Build Unification
Problem Statement
The current DataBuild architecture has significant duplication and architectural inconsistencies between CLI and Service build orchestration:
Current Duplication Issues
- Event Emission Logic: Service HTTP handlers and CLI binaries contain duplicate orchestration event emission code
- Mode Detection: Analysis and execution binaries (
analyze.rsandexecute.rs) useDATABUILD_CLI_MODEenvironment variable to conditionally emit different events - Test Complexity: End-to-end tests must account for different event patterns between CLI and Service for identical logical operations
Specific Code References
- CLI Mode Detection in Analysis:
databuild/graph/analyze.rs:555-587- Emits "Build request received" and "Starting build planning" events only in CLI mode - CLI Mode Detection in Execution:
databuild/graph/execute.rs:413-428andexecute.rs:753-779- Emits execution start/completion events only in CLI mode - Service Orchestration:
databuild/service/handlers.rs- HTTP handlers emit orchestration events independently
Architectural Problems
- Single Responsibility Violation: Analysis and execution binaries serve dual purposes as both shared library functions and CLI entry points
- Consistency Risk: Separate implementations of orchestration logic create risk of drift between CLI and Service behavior
- Maintenance Burden: Changes to orchestration requirements must be implemented in multiple places
Current Architecture Analysis
Service Flow
HTTP Request → Service Handler → Orchestration Events → Analysis → Execution → Completion Events
The Service has a natural coordination point in the HTTP handler that manages the entire build lifecycle and emits appropriate orchestration events.
CLI Flow
Shell Script → Analysis Binary (CLI mode) → Execution Binary (CLI mode) → Orchestration Events
The CLI lacks a natural coordination point, forcing the shared analysis/execution binaries to detect CLI mode and emit orchestration events themselves.
Event Flow Comparison
Service Events (coordinated):
- Build request received
- Starting build planning
- Analysis events (partitions scheduled, jobs configured)
- Starting build execution
- Execution events (jobs scheduled/completed, partitions available)
- Build request completed
CLI Events (mode-dependent):
- Same events as Service, but emitted conditionally based on
DATABUILD_CLI_MODE - Creates awkward coupling between orchestration concerns and domain logic
Proposed Shared Library Design
Core Orchestrator API
pub struct BuildOrchestrator {
event_log: Box<dyn BuildEventLog>,
build_request_id: String,
requested_partitions: Vec<PartitionRef>,
}
impl BuildOrchestrator {
pub fn new(
event_log: Box<dyn BuildEventLog>,
build_request_id: String,
requested_partitions: Vec<PartitionRef>
) -> Self;
// Lifecycle events
pub async fn start_build(&self) -> Result<(), Error>;
pub async fn start_planning(&self) -> Result<(), Error>;
pub async fn start_execution(&self) -> Result<(), Error>;
pub async fn complete_build(&self, result: BuildResult) -> Result<(), Error>;
// Domain events (pass-through to existing logic)
pub async fn emit_partition_scheduled(&self, partition: &PartitionRef) -> Result<(), Error>;
pub async fn emit_job_scheduled(&self, job: &JobEvent) -> Result<(), Error>;
pub async fn emit_job_completed(&self, job: &JobEvent) -> Result<(), Error>;
pub async fn emit_partition_available(&self, partition: &PartitionEvent) -> Result<(), Error>;
pub async fn emit_delegation(&self, partition: &str, target_build: &str, message: &str) -> Result<(), Error>;
}
pub enum BuildResult {
Success { jobs_completed: usize },
Failed { jobs_completed: usize, jobs_failed: usize },
FailFast { trigger_job: String },
}
Event Emission Strategy
The orchestrator will emit standardized events at specific lifecycle points:
- Build Lifecycle Events: High-level orchestration (received, planning, executing, completed)
- Domain Events: Pass-through wrapper for existing analysis/execution events
- Consistent Timing: All events emitted through orchestrator ensure proper sequencing
Error Handling
#[derive(Debug, thiserror::Error)]
pub enum OrchestrationError {
#[error("Event log error: {0}")]
EventLog(#[from] databuild::event_log::Error),
#[error("Build coordination error: {0}")]
Coordination(String),
#[error("Invalid build state transition: {current} -> {requested}")]
InvalidStateTransition { current: String, requested: String },
}
Testing Interface
#[cfg(test)]
impl BuildOrchestrator {
pub fn with_mock_event_log(build_request_id: String) -> (Self, MockEventLog);
pub fn emitted_events(&self) -> &[BuildEvent];
}
Implementation Phases
Phase 1: Create Shared Orchestration Library
Files to Create:
databuild/orchestration/mod.rs- Core orchestrator implementationdatabuild/orchestration/events.rs- Event type definitions and helpersdatabuild/orchestration/error.rs- Error typesdatabuild/orchestration/tests.rs- Unit tests for orchestrator
Key Implementation Points:
- Extract common event emission patterns from Service and CLI
- Ensure orchestrator is async-compatible with existing event log interface
- Design for testability with dependency injection
Phase 2: Refactor Service to Use Orchestrator
Files to Modify:
databuild/service/handlers.rs- Replace direct event emission with orchestrator callsdatabuild/service/mod.rs- Integration with orchestrator lifecycle
Implementation:
- Replace existing event emission code directly with orchestrator calls
- Ensure proper error handling and async integration
Phase 3: Create New CLI Wrapper
Files to Create:
databuild/cli/main.rs- New CLI entry point using orchestratordatabuild/cli/error.rs- CLI-specific error handling
Implementation:
// databuild/cli/main.rs
#[tokio::main]
async fn main() -> Result<(), CliError> {
let args = parse_cli_args();
let event_log = create_build_event_log(&args.event_log_uri).await?;
let build_request_id = args.build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string());
let orchestrator = BuildOrchestrator::new(event_log, build_request_id, args.partitions.clone());
// Emit orchestration events
orchestrator.start_build().await?;
orchestrator.start_planning().await?;
// Run analysis
let graph = run_analysis(&args.partitions, &orchestrator).await?;
orchestrator.start_execution().await?;
// Run execution
let result = run_execution(graph, &orchestrator).await?;
orchestrator.complete_build(result).await?;
Ok(())
}
Phase 4: Remove CLI Mode Detection
Files to Modify:
databuild/graph/analyze.rs- Remove lines 555-587 (CLI mode orchestration events)databuild/graph/execute.rs- Remove lines 413-428 and 753-779 (CLI mode orchestration events)
Verification:
- Analysis and execution binaries become pure domain functions
- No more environment variable mode detection
- All orchestration handled by wrapper/service
Phase 5: Update Bazel Rules
Files to Modify:
databuild/rules.bzl- Update_databuild_graph_build_implto use new CLI wrapper instead of direct analysis/execution pipeline
Before:
$(rlocation _main/{analyze_path}) $@ | $(rlocation _main/{exec_path})
After:
$(rlocation _main/{cli_wrapper_path}) $@
Phase 6: Update Tests
Files to Modify:
tests/end_to_end/simple_test.sh- Remove separate CLI/Service event validationtests/end_to_end/podcast_simple_test.sh- Same simplification- All tests expect identical event patterns from CLI and Service
Migration Strategy
Direct Replacement Approach
Since we don't need backwards compatibility, we can implement a direct replacement:
- Replace existing CLI mode detection immediately
- Refactor Service handlers to use orchestrator directly
- Update Bazel rules to use new CLI wrapper
- Update tests to expect unified behavior
Testing Strategy
- Unit Tests: Comprehensive orchestrator testing with mock event logs
- Integration Tests: Existing end-to-end tests pass with unified implementation
- Event Verification: Ensure orchestrator produces expected events for all scenarios
File Changes Required
New Files
databuild/orchestration/mod.rs- 200+ lines, core orchestratordatabuild/orchestration/events.rs- 100+ lines, event helpersdatabuild/orchestration/error.rs- 50+ lines, error typesdatabuild/orchestration/tests.rs- 300+ lines, comprehensive testsdatabuild/cli/main.rs- 150+ lines, CLI wrapperdatabuild/cli/error.rs- 50+ lines, CLI error handling
Modified Files
databuild/service/handlers.rs- Replace ~50 lines of event emission with orchestrator callsdatabuild/graph/analyze.rs- Remove ~30 lines of CLI mode detectiondatabuild/graph/execute.rs- Remove ~60 lines of CLI mode detectiondatabuild/rules.bzl- Update ~10 lines for new CLI wrappertests/end_to_end/simple_test.sh- Simplify ~20 lines of event validationtests/end_to_end/podcast_simple_test.sh- Same simplification
Build Configuration
- Update
databuild/BUILD.bazelto include orchestration module - Update
databuild/cli/BUILD.bazelfor new CLI binary - Modify example graphs to use new CLI wrapper
Benefits & Risk Analysis
Benefits
- Maintainability: Single source of truth for orchestration logic eliminates duplication
- Consistency: Guaranteed identical events across CLI and Service interfaces
- Extensibility: Foundation for SDK, additional CLI commands, monitoring integration
- Testing: Simplified test expectations, better unit test coverage of orchestration
- Architecture: Clean separation between orchestration and domain logic
Implementation Risks
- Regression: Changes to critical path could introduce subtle bugs
- Performance: Additional abstraction layer could impact latency
- Integration: Bazel build changes could break example workflows
Risk Mitigation
- Phased Implementation: Implement in stages with verification at each step
- Comprehensive Testing: Thorough unit and integration testing
- Event Verification: Ensure identical event patterns to current behavior
Future Architecture Extensions
SDK Integration
The unified orchestrator provides a natural integration point for external SDKs:
// Future SDK usage
let databuild_client = DatabuildClient::new(endpoint);
let orchestrator = databuild_client.create_orchestrator(partitions).await?;
orchestrator.start_build().await?;
let result = databuild_client.execute_build(orchestrator).await?;
Additional CLI Commands
Orchestrator enables consistent event emission across CLI commands:
databuild validate --partitions "data/users" --dry-run
databuild status --build-id "abc123"
databuild retry --build-id "abc123" --failed-jobs-only
Monitoring Integration
Standardized events provide foundation for observability:
impl BuildOrchestrator {
pub fn with_tracing_span(&self, span: tracing::Span) -> Self;
pub fn emit_otel_metrics(&self) -> Result<(), Error>;
}
CI/CD Pipeline Integration
Orchestrator events enable standardized build reporting across environments:
# GitHub Actions integration
- name: DataBuild
uses: databuild/github-action@v1
with:
partitions: "data/daily_reports"
event-log: "${{ env.DATABUILD_EVENT_LOG }}"
# Automatic event collection for build status reporting
Conclusion
This unification addresses fundamental architectural inconsistencies while providing a foundation for future extensibility. The phased implementation approach minimizes risk while ensuring backward compatibility throughout the transition.
The shared orchestrator eliminates the current awkward CLI mode detection pattern and establishes DataBuild as a platform that can support multiple interfaces with guaranteed consistency.