databuild/plans/20-wants-initial.md

163 lines
No EOL
6.4 KiB
Markdown

# Wants System Implementation
## Overview
This plan implements the wants system described in [design/wants.md](../design/wants.md), transitioning DataBuild from direct build requests to a declarative want-based model with cross-graph coordination and SLA tracking. This builds on the 3-tier BEL architecture and client-server CLI established in the previous phases.
## Prerequisites
This plan assumes completion of:
- **Phase 18**: 3-tier BEL architecture with storage/query/client layers
- **Phase 19**: Client-server CLI architecture with service delegation
## Implementation Phases
### Phase 1: Extend BEL Storage for Wants
1. **Add PartitionWantEvent to databuild.proto**
- Want event schema as defined in design/wants.md
- Want source tracking (CLI, dashboard, scheduled, API)
- TTL and SLA timestamp fields
- External dependency specifications
2. **Extend BELStorage Interface**
- Add `append_want()` method for want events
- Extend `EventFilter` to support want filtering
- Add want-specific query capabilities to storage layer
3. **Implement in SQLite Storage Backend**
- Add wants table with appropriate indexes
- Implement want filtering in list_events()
- Schema migration logic for existing databases
### Phase 2: Basic Want API in Service
1. **Implement Want Management in Service**
- Service methods for creating and querying wants
- Want lifecycle management (creation, expiration, satisfaction)
- Integration with existing service auto-management
2. **Add Want HTTP Endpoints**
- `POST /api/v1/wants` - Create new want
- `GET /api/v1/wants` - List active wants with filtering
- `GET /api/v1/wants/{id}` - Get want details
- `DELETE /api/v1/wants/{id}` - Cancel want
3. **CLI Want Commands**
- `./bazel-bin/my_graph.build want create <partition-ref>` with SLA/TTL options
- `./bazel-bin/my_graph.build want list` with filtering options
- `./bazel-bin/my_graph.build want status <partition-ref>` for want status
- Modify build commands to create wants via service
### Phase 3: Want-Driven Build Evaluation
1. **Implement Build Evaluator in Service**
- Continuous evaluation loop that checks for buildable wants
- External dependency satisfaction checking
- TTL expiration filtering for active wants
2. **Replace Build Request Handling**
- Graph build commands create wants instead of direct build requests
- Service background loop evaluates wants and triggers builds
- Maintain atomic build semantics while satisfying multiple wants
3. **Build Coordination Logic**
- Aggregate wants that can be satisfied by same build
- Priority handling for urgent wants (short SLA)
- Resource coordination across concurrent want evaluation
### Phase 4: Cross-Graph Coordination
1. **Implement GraphService API**
- HTTP API for cross-graph event streaming as defined in design/wants.md
- Event filtering for efficient partition pattern subscriptions
- Service-to-service communication for upstream dependencies
2. **Upstream Dependency Configuration**
- Service configuration for upstream DataBuild instances
- Partition pattern subscriptions to upstream graphs
- Automatic want evaluation when upstream partitions become available
3. **Cross-Graph Event Sync**
- Background sync process for upstream events
- Triggering local build evaluation on upstream availability
- Reliable HTTP-based coordination to avoid message loss
### Phase 5: SLA Monitoring and Dashboard Integration
1. **SLA Violation Tracking**
- External monitoring endpoints for SLA violations
- Want timeline and status tracking
- Integration with existing dashboard for want visualization
2. **Want Dashboard Features**
- Want creation and monitoring UI
- Cross-graph dependency visualization
- SLA violation dashboard and alerting
3. **Migration from Direct Builds**
- All build requests go through want system
- Remove direct build request pathways
- Update documentation for new build model
## Benefits of Want-Based Architecture
### ✅ **Unified Build Model**
- All builds (manual, scheduled, triggered) use same want mechanism
- Complete audit trail in build event log
- Consistent SLA tracking across all build types
### ✅ **Event-Driven Efficiency**
- Builds only triggered when dependencies change
- Cross-graph coordination via efficient event streaming
- No polling for task readiness within builds
### ✅ **Atomic Build Semantics Preserved**
- Individual build requests remain all-or-nothing
- Fast failure provides immediate feedback
- Partial progress via multiple build requests over time
### ✅ **Flexible SLA Management**
- Separate business expectations (SLA) from operational limits (TTL)
- External monitoring with clear blame assignment
- Automatic cleanup of stale wants
### ✅ **Cross-Graph Scalability**
- Reliable HTTP-based coordination
- Efficient filtering via partition patterns
- Decentralized architecture with clear boundaries
## Success Criteria
### Phase 1: Storage Foundation
- [ ] Want events can be stored and queried in BEL storage
- [ ] EventFilter supports want-specific filtering
- [ ] SQLite backend handles want operations efficiently
### Phase 2: Basic Want API
- [ ] Service can create and query wants via HTTP API
- [ ] Graph build commands work for want management
- [ ] Build commands create wants instead of direct builds
### Phase 3: Want-Driven Builds
- [ ] Service background loop evaluates wants continuously
- [ ] Build evaluation triggers on want creation and external events
- [ ] TTL expiration and external dependency checking work correctly
### Phase 4: Cross-Graph Coordination
- [ ] GraphService API returns filtered events for cross-graph coordination
- [ ] Upstream partition availability triggers downstream want evaluation
- [ ] Service-to-service communication is reliable and efficient
### Phase 5: Complete Migration
- [ ] All builds go through want system
- [ ] Dashboard supports want creation and monitoring
- [ ] SLA violation endpoints provide monitoring integration
- [ ] Documentation reflects new want-based build model
## Risk Mitigation
1. **Incremental Migration**: Implement wants alongside existing build system initially
2. **Performance Validation**: Ensure want evaluation doesn't introduce significant latency
3. **Backwards Compatibility**: Maintain existing build semantics during transition
4. **Monitoring Integration**: Provide clear observability into want lifecycle and performance