336 lines
No EOL
12 KiB
Markdown
336 lines
No EOL
12 KiB
Markdown
# CLI-Service Build Unification
|
|
|
|
## Problem Statement
|
|
|
|
The current DataBuild architecture has significant duplication and architectural inconsistencies between CLI and Service build orchestration:
|
|
|
|
### Current Duplication Issues
|
|
|
|
1. **Event Emission Logic**: Service HTTP handlers and CLI binaries contain duplicate orchestration event emission code
|
|
2. **Mode Detection**: Analysis and execution binaries (`analyze.rs` and `execute.rs`) use `DATABUILD_CLI_MODE` environment variable to conditionally emit different events
|
|
3. **Test Complexity**: End-to-end tests must account for different event patterns between CLI and Service for identical logical operations
|
|
|
|
### Specific Code References
|
|
|
|
- **CLI Mode Detection in Analysis**: `databuild/graph/analyze.rs:555-587` - Emits "Build request received" and "Starting build planning" events only in CLI mode
|
|
- **CLI Mode Detection in Execution**: `databuild/graph/execute.rs:413-428` and `execute.rs:753-779` - Emits execution start/completion events only in CLI mode
|
|
- **Service Orchestration**: `databuild/service/handlers.rs` - HTTP handlers emit orchestration events independently
|
|
|
|
### Architectural Problems
|
|
|
|
1. **Single Responsibility Violation**: Analysis and execution binaries serve dual purposes as both shared library functions and CLI entry points
|
|
2. **Consistency Risk**: Separate implementations of orchestration logic create risk of drift between CLI and Service behavior
|
|
3. **Maintenance Burden**: Changes to orchestration requirements must be implemented in multiple places
|
|
|
|
## Current Architecture Analysis
|
|
|
|
### Service Flow
|
|
```
|
|
HTTP Request → Service Handler → Orchestration Events → Analysis → Execution → Completion Events
|
|
```
|
|
|
|
The Service has a natural coordination point in the HTTP handler that manages the entire build lifecycle and emits appropriate orchestration events.
|
|
|
|
### CLI Flow
|
|
```
|
|
Shell Script → Analysis Binary (CLI mode) → Execution Binary (CLI mode) → Orchestration Events
|
|
```
|
|
|
|
The CLI lacks a natural coordination point, forcing the shared analysis/execution binaries to detect CLI mode and emit orchestration events themselves.
|
|
|
|
### Event Flow Comparison
|
|
|
|
**Service Events** (coordinated):
|
|
1. Build request received
|
|
2. Starting build planning
|
|
3. Analysis events (partitions scheduled, jobs configured)
|
|
4. Starting build execution
|
|
5. Execution events (jobs scheduled/completed, partitions available)
|
|
6. Build request completed
|
|
|
|
**CLI Events** (mode-dependent):
|
|
- Same events as Service, but emitted conditionally based on `DATABUILD_CLI_MODE`
|
|
- Creates awkward coupling between orchestration concerns and domain logic
|
|
|
|
## Proposed Shared Library Design
|
|
|
|
### Core Orchestrator API
|
|
|
|
```rust
|
|
pub struct BuildOrchestrator {
|
|
event_log: Box<dyn BuildEventLog>,
|
|
build_request_id: String,
|
|
requested_partitions: Vec<PartitionRef>,
|
|
}
|
|
|
|
impl BuildOrchestrator {
|
|
pub fn new(
|
|
event_log: Box<dyn BuildEventLog>,
|
|
build_request_id: String,
|
|
requested_partitions: Vec<PartitionRef>
|
|
) -> Self;
|
|
|
|
// Lifecycle events
|
|
pub async fn start_build(&self) -> Result<(), Error>;
|
|
pub async fn start_planning(&self) -> Result<(), Error>;
|
|
pub async fn start_execution(&self) -> Result<(), Error>;
|
|
pub async fn complete_build(&self, result: BuildResult) -> Result<(), Error>;
|
|
|
|
// Domain events (pass-through to existing logic)
|
|
pub async fn emit_partition_scheduled(&self, partition: &PartitionRef) -> Result<(), Error>;
|
|
pub async fn emit_job_scheduled(&self, job: &JobEvent) -> Result<(), Error>;
|
|
pub async fn emit_job_completed(&self, job: &JobEvent) -> Result<(), Error>;
|
|
pub async fn emit_partition_available(&self, partition: &PartitionEvent) -> Result<(), Error>;
|
|
pub async fn emit_delegation(&self, partition: &str, target_build: &str, message: &str) -> Result<(), Error>;
|
|
}
|
|
|
|
pub enum BuildResult {
|
|
Success { jobs_completed: usize },
|
|
Failed { jobs_completed: usize, jobs_failed: usize },
|
|
FailFast { trigger_job: String },
|
|
}
|
|
```
|
|
|
|
### Event Emission Strategy
|
|
|
|
The orchestrator will emit standardized events at specific lifecycle points:
|
|
|
|
1. **Build Lifecycle Events**: High-level orchestration (received, planning, executing, completed)
|
|
2. **Domain Events**: Pass-through wrapper for existing analysis/execution events
|
|
3. **Consistent Timing**: All events emitted through orchestrator ensure proper sequencing
|
|
|
|
### Error Handling
|
|
|
|
```rust
|
|
#[derive(Debug, thiserror::Error)]
|
|
pub enum OrchestrationError {
|
|
#[error("Event log error: {0}")]
|
|
EventLog(#[from] databuild::event_log::Error),
|
|
|
|
#[error("Build coordination error: {0}")]
|
|
Coordination(String),
|
|
|
|
#[error("Invalid build state transition: {current} -> {requested}")]
|
|
InvalidStateTransition { current: String, requested: String },
|
|
}
|
|
```
|
|
|
|
### Testing Interface
|
|
|
|
```rust
|
|
#[cfg(test)]
|
|
impl BuildOrchestrator {
|
|
pub fn with_mock_event_log(build_request_id: String) -> (Self, MockEventLog);
|
|
pub fn emitted_events(&self) -> &[BuildEvent];
|
|
}
|
|
```
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Create Shared Orchestration Library
|
|
|
|
**Files to Create**:
|
|
- `databuild/orchestration/mod.rs` - Core orchestrator implementation
|
|
- `databuild/orchestration/events.rs` - Event type definitions and helpers
|
|
- `databuild/orchestration/error.rs` - Error types
|
|
- `databuild/orchestration/tests.rs` - Unit tests for orchestrator
|
|
|
|
**Key Implementation Points**:
|
|
- Extract common event emission patterns from Service and CLI
|
|
- Ensure orchestrator is async-compatible with existing event log interface
|
|
- Design for testability with dependency injection
|
|
|
|
### Phase 2: Refactor Service to Use Orchestrator
|
|
|
|
**Files to Modify**:
|
|
- `databuild/service/handlers.rs` - Replace direct event emission with orchestrator calls
|
|
- `databuild/service/mod.rs` - Integration with orchestrator lifecycle
|
|
|
|
**Implementation**:
|
|
- Replace existing event emission code directly with orchestrator calls
|
|
- Ensure proper error handling and async integration
|
|
|
|
### Phase 3: Create New CLI Wrapper
|
|
|
|
**Files to Create**:
|
|
- `databuild/cli/main.rs` - New CLI entry point using orchestrator
|
|
- `databuild/cli/error.rs` - CLI-specific error handling
|
|
|
|
**Implementation**:
|
|
```rust
|
|
// databuild/cli/main.rs
|
|
#[tokio::main]
|
|
async fn main() -> Result<(), CliError> {
|
|
let args = parse_cli_args();
|
|
let event_log = create_build_event_log(&args.event_log_uri).await?;
|
|
let build_request_id = args.build_request_id.unwrap_or_else(|| Uuid::new_v4().to_string());
|
|
|
|
let orchestrator = BuildOrchestrator::new(event_log, build_request_id, args.partitions.clone());
|
|
|
|
// Emit orchestration events
|
|
orchestrator.start_build().await?;
|
|
orchestrator.start_planning().await?;
|
|
|
|
// Run analysis
|
|
let graph = run_analysis(&args.partitions, &orchestrator).await?;
|
|
|
|
orchestrator.start_execution().await?;
|
|
|
|
// Run execution
|
|
let result = run_execution(graph, &orchestrator).await?;
|
|
|
|
orchestrator.complete_build(result).await?;
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
### Phase 4: Remove CLI Mode Detection
|
|
|
|
**Files to Modify**:
|
|
- `databuild/graph/analyze.rs` - Remove lines 555-587 (CLI mode orchestration events)
|
|
- `databuild/graph/execute.rs` - Remove lines 413-428 and 753-779 (CLI mode orchestration events)
|
|
|
|
**Verification**:
|
|
- Analysis and execution binaries become pure domain functions
|
|
- No more environment variable mode detection
|
|
- All orchestration handled by wrapper/service
|
|
|
|
### Phase 5: Update Bazel Rules
|
|
|
|
**Files to Modify**:
|
|
- `databuild/rules.bzl` - Update `_databuild_graph_build_impl` to use new CLI wrapper instead of direct analysis/execution pipeline
|
|
|
|
**Before**:
|
|
```bash
|
|
$(rlocation _main/{analyze_path}) $@ | $(rlocation _main/{exec_path})
|
|
```
|
|
|
|
**After**:
|
|
```bash
|
|
$(rlocation _main/{cli_wrapper_path}) $@
|
|
```
|
|
|
|
### Phase 6: Update Tests
|
|
|
|
**Files to Modify**:
|
|
- `tests/end_to_end/simple_test.sh` - Remove separate CLI/Service event validation
|
|
- `tests/end_to_end/podcast_simple_test.sh` - Same simplification
|
|
- All tests expect identical event patterns from CLI and Service
|
|
|
|
## Migration Strategy
|
|
|
|
### Direct Replacement Approach
|
|
|
|
Since we don't need backwards compatibility, we can implement a direct replacement:
|
|
- Replace existing CLI mode detection immediately
|
|
- Refactor Service handlers to use orchestrator directly
|
|
- Update Bazel rules to use new CLI wrapper
|
|
- Update tests to expect unified behavior
|
|
|
|
### Testing Strategy
|
|
|
|
1. **Unit Tests**: Comprehensive orchestrator testing with mock event logs
|
|
2. **Integration Tests**: Existing end-to-end tests pass with unified implementation
|
|
3. **Event Verification**: Ensure orchestrator produces expected events for all scenarios
|
|
|
|
## File Changes Required
|
|
|
|
### New Files
|
|
- `databuild/orchestration/mod.rs` - 200+ lines, core orchestrator
|
|
- `databuild/orchestration/events.rs` - 100+ lines, event helpers
|
|
- `databuild/orchestration/error.rs` - 50+ lines, error types
|
|
- `databuild/orchestration/tests.rs` - 300+ lines, comprehensive tests
|
|
- `databuild/cli/main.rs` - 150+ lines, CLI wrapper
|
|
- `databuild/cli/error.rs` - 50+ lines, CLI error handling
|
|
|
|
### Modified Files
|
|
- `databuild/service/handlers.rs` - Replace ~50 lines of event emission with orchestrator calls
|
|
- `databuild/graph/analyze.rs` - Remove ~30 lines of CLI mode detection
|
|
- `databuild/graph/execute.rs` - Remove ~60 lines of CLI mode detection
|
|
- `databuild/rules.bzl` - Update ~10 lines for new CLI wrapper
|
|
- `tests/end_to_end/simple_test.sh` - Simplify ~20 lines of event validation
|
|
- `tests/end_to_end/podcast_simple_test.sh` - Same simplification
|
|
|
|
### Build Configuration
|
|
- Update `databuild/BUILD.bazel` to include orchestration module
|
|
- Update `databuild/cli/BUILD.bazel` for new CLI binary
|
|
- Modify example graphs to use new CLI wrapper
|
|
|
|
## Benefits & Risk Analysis
|
|
|
|
### Benefits
|
|
|
|
1. **Maintainability**: Single source of truth for orchestration logic eliminates duplication
|
|
2. **Consistency**: Guaranteed identical events across CLI and Service interfaces
|
|
3. **Extensibility**: Foundation for SDK, additional CLI commands, monitoring integration
|
|
4. **Testing**: Simplified test expectations, better unit test coverage of orchestration
|
|
5. **Architecture**: Clean separation between orchestration and domain logic
|
|
|
|
### Implementation Risks
|
|
|
|
1. **Regression**: Changes to critical path could introduce subtle bugs
|
|
2. **Performance**: Additional abstraction layer could impact latency
|
|
3. **Integration**: Bazel build changes could break example workflows
|
|
|
|
### Risk Mitigation
|
|
|
|
1. **Phased Implementation**: Implement in stages with verification at each step
|
|
2. **Comprehensive Testing**: Thorough unit and integration testing
|
|
3. **Event Verification**: Ensure identical event patterns to current behavior
|
|
|
|
## Future Architecture Extensions
|
|
|
|
### SDK Integration
|
|
|
|
The unified orchestrator provides a natural integration point for external SDKs:
|
|
|
|
```rust
|
|
// Future SDK usage
|
|
let databuild_client = DatabuildClient::new(endpoint);
|
|
let orchestrator = databuild_client.create_orchestrator(partitions).await?;
|
|
|
|
orchestrator.start_build().await?;
|
|
let result = databuild_client.execute_build(orchestrator).await?;
|
|
```
|
|
|
|
### Additional CLI Commands
|
|
|
|
Orchestrator enables consistent event emission across CLI commands:
|
|
|
|
```bash
|
|
databuild validate --partitions "data/users" --dry-run
|
|
databuild status --build-id "abc123"
|
|
databuild retry --build-id "abc123" --failed-jobs-only
|
|
```
|
|
|
|
### Monitoring Integration
|
|
|
|
Standardized events provide foundation for observability:
|
|
|
|
```rust
|
|
impl BuildOrchestrator {
|
|
pub fn with_tracing_span(&self, span: tracing::Span) -> Self;
|
|
pub fn emit_otel_metrics(&self) -> Result<(), Error>;
|
|
}
|
|
```
|
|
|
|
### CI/CD Pipeline Integration
|
|
|
|
Orchestrator events enable standardized build reporting across environments:
|
|
|
|
```yaml
|
|
# GitHub Actions integration
|
|
- name: DataBuild
|
|
uses: databuild/github-action@v1
|
|
with:
|
|
partitions: "data/daily_reports"
|
|
event-log: "${{ env.DATABUILD_EVENT_LOG }}"
|
|
# Automatic event collection for build status reporting
|
|
```
|
|
|
|
## Conclusion
|
|
|
|
This unification addresses fundamental architectural inconsistencies while providing a foundation for future extensibility. The phased implementation approach minimizes risk while ensuring backward compatibility throughout the transition.
|
|
|
|
The shared orchestrator eliminates the current awkward CLI mode detection pattern and establishes DataBuild as a platform that can support multiple interfaces with guaranteed consistency. |