databuild/plans/19-client-server-cli.md

182 lines
No EOL
6.9 KiB
Markdown

# Client-Server CLI Architecture
## Overview
This plan transforms DataBuild's CLI from a monolithic in-process execution model to a Bazel-style client-server architecture. The CLI becomes a thin client that delegates all operations to a persistent service process, enabling better resource management and build coordination.
## Current State Analysis
The current CLI (`databuild/cli/main.rs`) directly:
- Creates event log connections
- Runs analysis and execution in-process
- Spawns bazel processes directly
- No coordination between concurrent CLI invocations
This creates several limitations:
- No coordination between concurrent builds
- Multiple BEL connections from concurrent CLI calls
- Each CLI process spawns separate bazel execution
- No shared execution environment for builds
## Target Architecture
### Bazel-Style Client-Server Model
**CLI (Thin Client)**:
- Auto-starts service if not running
- Delegates all operations to service via HTTP
- Streams progress back to user
- Auto-shuts down idle service
**Service (Persistent Process)**:
- Maintains single BEL connection
- Coordinates builds across multiple CLI calls
- Manages bazel execution processes
- Auto-shuts down after idle timeout
## Implementation Plan
### Phase 1: Service Foundation
1. **Extend Current Service for CLI Operations**
- Add new endpoints to handle CLI build requests
- Move analysis and execution logic from CLI to service
- Service maintains orchestrator state and coordinates builds
2. **Add CLI-Specific API Endpoints**
- `/api/v1/cli/build` - Handle build requests from CLI
- `/api/v1/cli/builds/{id}/progress` - Stream build progress via Server-Sent Events
- Request/response types for CLI build operations
- Background vs foreground build support
3. **Add Service Auto-Management**
- Service tracks last activity timestamp
- Configurable auto-shutdown timeout (default: 5 minutes)
- Service monitors for idle state and gracefully shuts down
- Activity tracking includes API calls and active builds
4. **Service Port Management**
- Service attempts to bind to preferred port (e.g., 8080)
- If port unavailable, tries next available port in range
- Service writes actual port to lockfile/pidfile for CLI discovery
- CLI reads port from lockfile to connect to running service
- Cleanup lockfile on service shutdown
### Phase 2: Thin CLI Implementation
1. **New CLI Main Function**
- Replace existing main with service delegation logic
- Parse arguments and determine target service operation
- Handle service connection and auto-start logic
- Preserve existing CLI interface and help text
2. **Service Client Implementation**
- HTTP client for communicating with service
- Auto-start service if not already running
- Health check and connection retry logic
- Progress streaming for real-time build feedback
3. **Build Command via Service**
- Parse build arguments and create service request
- Submit build request to service endpoint
- Stream progress updates for foreground builds
- Return immediately for background builds with build ID
### Phase 3: Repository Commands via Service
1. **Delegate Repository Commands to Service**
- Partition, build, job, and task commands go through service
- Use existing service API endpoints where available
- Maintain same output formats (table, JSON) as current CLI
- Preserve all existing functionality and options
2. **Service Client Repository Methods**
- Client methods for each repository operation
- Handle pagination, filtering, and formatting options
- Error handling and appropriate HTTP status code handling
- URL encoding for partition references and other parameters
### Phase 4: Complete Migration
1. **Remove Old CLI Implementation**
- Delete existing `databuild/cli/main.rs` implementation
- Remove in-process analysis and execution logic
- Clean up CLI-specific dependencies that are no longer needed
- Update build configuration to use new thin client only
2. **Service Integration Testing**
- End-to-end testing of CLI-to-service communication
- Verify all existing CLI functionality works through service
- Performance testing to ensure no regression
- Error handling validation for various failure modes
### Phase 5: Integration and Testing
1. **Environment Variable Support**
- `DATABUILD_SERVICE_URL` for custom service locations
- `DATABUILD_SERVICE_TIMEOUT` for auto-shutdown configuration
- Existing BEL environment variables passed to service
- Clear precedence rules for configuration sources
2. **Error Handling and User Experience**
- Service startup timeout and clear error messages
- Connection failure handling with fallback suggestions
- Health check logic to verify service readiness
- Graceful handling of service unavailability
## Benefits of Client-Server Architecture
### ✅ **Build Coordination**
- Multiple CLI calls share same service instance
- Coordination between concurrent builds
- Single BEL connection eliminates connection conflicts
### ✅ **Resource Management**
- Auto-shutdown prevents resource leaks
- Service manages persistent connections
- Better isolation between CLI and build execution
- Shared bazel execution environment
### ✅ **Improved User Experience**
- Background builds with `--background` flag
- Real-time progress streaming
- Consistent build execution environment
### ✅ **Simplified Architecture**
- Single execution path through service
- Cleaner separation of concerns
- Reduced code duplication
### ✅ **Future-Ready Foundation**
- Service architecture prepared for additional coordination features
- HTTP API foundation for programmatic access
- Clear separation of concerns between client and execution
## Success Criteria
### Phase 1-2: Service Foundation
- [ ] Service can handle CLI build requests
- [ ] Service auto-shutdown works correctly
- [ ] Service port management and discovery works
- [ ] New CLI can start and connect to service
- [ ] Build requests execute with same functionality as current CLI
### Phase 3-4: Complete Migration
- [ ] All CLI commands work via service delegation
- [ ] Repository commands (partitions, builds, etc.) work via HTTP API
- [ ] Old CLI implementation completely removed
- [ ] Error handling provides clear user feedback
### Phase 5: Polish
- [ ] Multiple concurrent CLI calls work correctly
- [ ] Background builds work as expected
- [ ] Performance meets or exceeds current CLI
- [ ] Service management is reliable and transparent
## Risk Mitigation
1. **Thorough Testing**: Comprehensive testing before removing old CLI
2. **Feature Parity**: Ensure all existing functionality works via service
3. **Performance Validation**: Benchmark new implementation against current performance
4. **Simple Protocol**: Use HTTP/JSON for service communication (not gRPC initially)
5. **Clear Error Messages**: Service startup and connection failures should be obvious to users