databuild/plans/07-cli-service-build-unification.md

336 lines
No EOL
12 KiB
Markdown

# CLI-Service Build Unification
## Problem Statement
The current DataBuild architecture has significant duplication and architectural inconsistencies between CLI and Service build orchestration:
### Current Duplication Issues
1. **Event Emission Logic**: Service HTTP handlers and CLI binaries contain duplicate orchestration event emission code
2. **Mode Detection**: Analysis and execution binaries (`analyze.rs` and `execute.rs`) use `DATABUILD_CLI_MODE` environment variable to conditionally emit different events
3. **Test Complexity**: End-to-end tests must account for different event patterns between CLI and Service for identical logical operations
### Specific Code References
- **CLI Mode Detection in Analysis**: `databuild/graph/analyze.rs:555-587` - Emits "Build request received" and "Starting build planning" events only in CLI mode
- **CLI Mode Detection in Execution**: `databuild/graph/execute.rs:413-428` and `execute.rs:753-779` - Emits execution start/completion events only in CLI mode
- **Service Orchestration**: `databuild/service/handlers.rs` - HTTP handlers emit orchestration events independently
### Architectural Problems
1. **Single Responsibility Violation**: Analysis and execution binaries serve dual purposes as both shared library functions and CLI entry points
2. **Consistency Risk**: Separate implementations of orchestration logic create risk of drift between CLI and Service behavior
3. **Maintenance Burden**: Changes to orchestration requirements must be implemented in multiple places
## Current Architecture Analysis
### Service Flow
```
HTTP Request → Service Handler → Orchestration Events → Analysis → Execution → Completion Events
```
The Service has a natural coordination point in the HTTP handler that manages the entire build lifecycle and emits appropriate orchestration events.
### CLI Flow
```
Shell Script → Analysis Binary (CLI mode) → Execution Binary (CLI mode) → Orchestration Events
```
The CLI lacks a natural coordination point, forcing the shared analysis/execution binaries to detect CLI mode and emit orchestration events themselves.
### Event Flow Comparison
**Service Events** (coordinated):
1. Build request received
2. Starting build planning
3. Analysis events (partitions scheduled, jobs configured)
4. Starting build execution
5. Execution events (jobs scheduled/completed, partitions available)
6. Build request completed
**CLI Events** (mode-dependent):
- Same events as Service, but emitted conditionally based on `DATABUILD_CLI_MODE`
- Creates awkward coupling between orchestration concerns and domain logic
## Proposed Shared Library Design
### Core Orchestrator API
```rust
pub struct BuildOrchestrator {
event_log: Box<dyn BuildEventLog>,
build_request_id: String,
requested_partitions: Vec<PartitionRef>,
}
impl BuildOrchestrator {
pub fn new(
event_log: Box<dyn BuildEventLog>,
build_request_id: String,
requested_partitions: Vec<PartitionRef>
) -> Self;
// Lifecycle events
pub async fn start_build(&self) -> Result<(), Error>;
pub async fn start_planning(&self) -> Result<(), Error>;
pub async fn start_execution(&self) -> Result<(), Error>;
pub async fn complete_build(&self, result: BuildResult) -> Result<(), Error>;
// Domain events (pass-through to existing logic)
pub async fn emit_partition_scheduled(&self, partition: &PartitionRef) -> Result<(), Error>;
pub async fn emit_job_scheduled(&self, job: &JobEvent) -> Result<(), Error>;
pub async fn emit_job_completed(&self, job: &JobEvent) -> Result<(), Error>;
pub async fn emit_partition_available(&self, partition: &PartitionEvent) -> Result<(), Error>;
pub async fn emit_delegation(&self, partition: &str, target_build: &str, message: &str) -> Result<(), Error>;
}
pub enum BuildResult {
Success { jobs_completed: usize },
Failed { jobs_completed: usize, jobs_failed: usize },
FailFast { trigger_job: String },
}
```
### Event Emission Strategy
The orchestrator will emit standardized events at specific lifecycle points:
1. **Build Lifecycle Events**: High-level orchestration (received, planning, executing, completed)
2. **Domain Events**: Pass-through wrapper for existing analysis/execution events
3. **Consistent Timing**: All events emitted through orchestrator ensure proper sequencing
### Error Handling
```rust
#[derive(Debug, thiserror::Error)]
pub enum OrchestrationError {
#[error("Event log error: {0}")]
EventLog(#[from] databuild::event_log::Error),
#[error("Build coordination error: {0}")]
Coordination(String),
#[error("Invalid build state transition: {current} -> {requested}")]
InvalidStateTransition { current: String, requested: String },
}
```
### Testing Interface
```rust
#[cfg(test)]
impl BuildOrchestrator {
pub fn with_mock_event_log(build_request_id: String) -> (Self, MockEventLog);
pub fn emitted_events(&self) -> &[BuildEvent];
}
```
## Implementation Phases
### Phase 1: Create Shared Orchestration Library
**Files to Create**:
- `databuild/orchestration/mod.rs` - Core orchestrator implementation
- `databuild/orchestration/events.rs` - Event type definitions and helpers
- `databuild/orchestration/error.rs` - Error types
- `databuild/orchestration/tests.rs` - Unit tests for orchestrator
**Key Implementation Points**:
- Extract common event emission patterns from Service and CLI
- Ensure orchestrator is async-compatible with existing event log interface
- Design for testability with dependency injection
### Phase 2: Refactor Service to Use Orchestrator
**Files to Modify**:
- `databuild/service/handlers.rs` - Replace direct event emission with orchestrator calls
- `databuild/service/mod.rs` - Integration with orchestrator lifecycle
**Implementation**:
- Replace existing event emission code directly with orchestrator calls
- Ensure proper error handling and async integration
### Phase 3: Create New CLI Wrapper
**Files to Create**:
- `databuild/cli/main.rs` - New CLI entry point using orchestrator
- `databuild/cli/error.rs` - CLI-specific error handling
**Implementation**:
```rust
// databuild/cli/main.rs
#[tokio::main]
async fn main() -> Result<(), CliError> {
let args = parse_cli_args();
let event_log = create_build_event_log(&args.event_log_uri).await?;
let build_request_id = args.build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string());
let orchestrator = BuildOrchestrator::new(event_log, build_request_id, args.partitions.clone());
// Emit orchestration events
orchestrator.start_build().await?;
orchestrator.start_planning().await?;
// Run analysis
let graph = run_analysis(&args.partitions, &orchestrator).await?;
orchestrator.start_execution().await?;
// Run execution
let result = run_execution(graph, &orchestrator).await?;
orchestrator.complete_build(result).await?;
Ok(())
}
```
### Phase 4: Remove CLI Mode Detection
**Files to Modify**:
- `databuild/graph/analyze.rs` - Remove lines 555-587 (CLI mode orchestration events)
- `databuild/graph/execute.rs` - Remove lines 413-428 and 753-779 (CLI mode orchestration events)
**Verification**:
- Analysis and execution binaries become pure domain functions
- No more environment variable mode detection
- All orchestration handled by wrapper/service
### Phase 5: Update Bazel Rules
**Files to Modify**:
- `databuild/rules.bzl` - Update `_databuild_graph_build_impl` to use new CLI wrapper instead of direct analysis/execution pipeline
**Before**:
```bash
$(rlocation _main/{analyze_path}) $@ | $(rlocation _main/{exec_path})
```
**After**:
```bash
$(rlocation _main/{cli_wrapper_path}) $@
```
### Phase 6: Update Tests
**Files to Modify**:
- `tests/end_to_end/simple_test.sh` - Remove separate CLI/Service event validation
- `tests/end_to_end/podcast_simple_test.sh` - Same simplification
- All tests expect identical event patterns from CLI and Service
## Migration Strategy
### Direct Replacement Approach
Since we don't need backwards compatibility, we can implement a direct replacement:
- Replace existing CLI mode detection immediately
- Refactor Service handlers to use orchestrator directly
- Update Bazel rules to use new CLI wrapper
- Update tests to expect unified behavior
### Testing Strategy
1. **Unit Tests**: Comprehensive orchestrator testing with mock event logs
2. **Integration Tests**: Existing end-to-end tests pass with unified implementation
3. **Event Verification**: Ensure orchestrator produces expected events for all scenarios
## File Changes Required
### New Files
- `databuild/orchestration/mod.rs` - 200+ lines, core orchestrator
- `databuild/orchestration/events.rs` - 100+ lines, event helpers
- `databuild/orchestration/error.rs` - 50+ lines, error types
- `databuild/orchestration/tests.rs` - 300+ lines, comprehensive tests
- `databuild/cli/main.rs` - 150+ lines, CLI wrapper
- `databuild/cli/error.rs` - 50+ lines, CLI error handling
### Modified Files
- `databuild/service/handlers.rs` - Replace ~50 lines of event emission with orchestrator calls
- `databuild/graph/analyze.rs` - Remove ~30 lines of CLI mode detection
- `databuild/graph/execute.rs` - Remove ~60 lines of CLI mode detection
- `databuild/rules.bzl` - Update ~10 lines for new CLI wrapper
- `tests/end_to_end/simple_test.sh` - Simplify ~20 lines of event validation
- `tests/end_to_end/podcast_simple_test.sh` - Same simplification
### Build Configuration
- Update `databuild/BUILD.bazel` to include orchestration module
- Update `databuild/cli/BUILD.bazel` for new CLI binary
- Modify example graphs to use new CLI wrapper
## Benefits & Risk Analysis
### Benefits
1. **Maintainability**: Single source of truth for orchestration logic eliminates duplication
2. **Consistency**: Guaranteed identical events across CLI and Service interfaces
3. **Extensibility**: Foundation for SDK, additional CLI commands, monitoring integration
4. **Testing**: Simplified test expectations, better unit test coverage of orchestration
5. **Architecture**: Clean separation between orchestration and domain logic
### Implementation Risks
1. **Regression**: Changes to critical path could introduce subtle bugs
2. **Performance**: Additional abstraction layer could impact latency
3. **Integration**: Bazel build changes could break example workflows
### Risk Mitigation
1. **Phased Implementation**: Implement in stages with verification at each step
2. **Comprehensive Testing**: Thorough unit and integration testing
3. **Event Verification**: Ensure identical event patterns to current behavior
## Future Architecture Extensions
### SDK Integration
The unified orchestrator provides a natural integration point for external SDKs:
```rust
// Future SDK usage
let databuild_client = DatabuildClient::new(endpoint);
let orchestrator = databuild_client.create_orchestrator(partitions).await?;
orchestrator.start_build().await?;
let result = databuild_client.execute_build(orchestrator).await?;
```
### Additional CLI Commands
Orchestrator enables consistent event emission across CLI commands:
```bash
databuild validate --partitions "data/users" --dry-run
databuild status --build-id "abc123"
databuild retry --build-id "abc123" --failed-jobs-only
```
### Monitoring Integration
Standardized events provide foundation for observability:
```rust
impl BuildOrchestrator {
pub fn with_tracing_span(&self, span: tracing::Span) -> Self;
pub fn emit_otel_metrics(&self) -> Result<(), Error>;
}
```
### CI/CD Pipeline Integration
Orchestrator events enable standardized build reporting across environments:
```yaml
# GitHub Actions integration
- name: DataBuild
uses: databuild/github-action@v1
with:
partitions: "data/daily_reports"
event-log: "${{ env.DATABUILD_EVENT_LOG }}"
# Automatic event collection for build status reporting
```
## Conclusion
This unification addresses fundamental architectural inconsistencies while providing a foundation for future extensibility. The phased implementation approach minimizes risk while ensuring backward compatibility throughout the transition.
The shared orchestrator eliminates the current awkward CLI mode detection pattern and establishes DataBuild as a platform that can support multiple interfaces with guaranteed consistency.