# Wants System Purpose: Enable declarative specification of data requirements with SLA tracking, cross-graph coordination, and efficient build triggering while maintaining atomic build semantics. ## Overview The wants system unifies all build requests (manual, scheduled, triggered) under a single declarative model where: - **Wants declare intent** via events in the [build event log](./build-event-log.md) - **Builds reactively satisfy** what's currently possible with atomic semantics - **Monitoring identifies gaps** between declared wants and delivered partitions - **Cross-graph coordination** happens via the `GraphService` API ## Architecture ### Core Components 1. **PartitionWantEvent**: Declarative specification of data requirements 2. **Build Evaluation**: Reactive logic that attempts to satisfy wants when possible 3. **SLA Monitoring**: External system that queries for expired wants 4. **Cross-Graph Coordination**: Event-driven dependency management across DataBuild instances ### Want Event Schema Defined in `databuild.proto`: ```protobuf message PartitionWantEvent { string partition_ref = 1; // Partition being requested int64 created_at = 2; // Server time when want registered int64 data_timestamp = 3; // Business time this partition represents optional uint64 ttl_seconds = 4; // Give up after this long (from created_at) optional uint64 sla_seconds = 5; // SLA violation after this long (from data_timestamp) repeated string external_dependencies = 6; // Cross-graph dependencies string want_id = 7; // Unique identifier WantSource source = 8; // How this want was created } message WantSource { oneof source_type { CliManual cli_manual = 1; // Manual CLI request DashboardManual dashboard_manual = 2; // Manual dashboard request Scheduled scheduled = 3; // Scheduled/triggered job ApiRequest api_request = 4; // External API call } } ``` ## Want Lifecycle ### 1. Want Registration All build requests become wants: ```rust // CLI: databuild build data/users/2024-01-01 PartitionWantEvent { partition_ref: "data/users/2024-01-01", created_at: now(), data_timestamp: None, // These must be set explicitly in the request ttl_seconds: None, sla_seconds: None, external_dependencies: vec![], // no externally sourced data necessary want_id: generate_uuid(), source: WantSource { ... }, } // Scheduled pipeline: Daily analytics PartitionWantEvent { partition_ref: "analytics/daily/2024-01-01", created_at: now(), data_timestamp: parse_date("2024-01-01"), ttl_seconds: Some(365 * 24 * 3600), // Keep trying for 1 year sla_seconds: Some(9 * 3600), // Expected by 9am (9hrs after data_timestamp) external_dependencies: vec!["data/users/2024-01-01"], want_id: "daily-analytics-2024-01-01", source: WantSource { ... }, } ``` ### 2. Build Evaluation DataBuild continuously evaluates build opportunities: ```rust async fn evaluate_build_opportunities(&self) -> Result> { let now = current_timestamp_nanos(); // Get wants that haven't exceeded TTL let active_wants = self.get_non_expired_wants(now).await?; // Filter to wants where external dependencies are satisfied let buildable_partitions = active_wants.into_iter() .filter(|want| self.external_dependencies_satisfied(want)) .map(|want| want.partition_ref) .collect(); if buildable_partitions.is_empty() { return Ok(None); } // Create atomic build request for all currently buildable partitions Ok(Some(BuildRequest { requested_partitions: buildable_partitions, reason: "satisfying_active_wants".to_string(), })) } ``` ### 3. Build Triggers Builds are triggered on: - **New want registration**: Check if newly wanted partitions are immediately buildable - **External partition availability**: Check if any blocked wants are now unblocked - **Manual trigger**: Force re-evaluation (for debugging) ## Cross-Graph Coordination ### GraphService API Graphs expose events for cross-graph coordination: ```rust trait GraphService { async fn list_events(&self, since_idx: i64, filter: EventFilter) -> Result; } ``` Where `EventFilter` supports partition patterns for efficient subscriptions: ```protobuf message EventFilter { repeated string partition_refs = 1; // Exact partition matches repeated string partition_patterns = 2; // Glob patterns like "data/users/*" repeated string job_labels = 3; // Job-specific events repeated string task_ids = 4; // Task run events repeated string build_request_ids = 5; // Build-specific events } ``` ### Upstream Dependencies Downstream graphs subscribe to upstream events: ```rust struct UpstreamDependency { service_url: String, // "https://upstream-databuild.corp.com" partition_patterns: Vec, // ["data/users/*", "ml/models/prod/*"] last_sync_idx: i64, } // Periodic sync of upstream events async fn sync_upstream_events(upstream: &UpstreamDependency) -> Result<()> { let client = GraphServiceClient::new(&upstream.service_url); let filter = EventFilter { partition_patterns: upstream.partition_patterns.clone(), ..Default::default() }; let events = client.list_events(upstream.last_sync_idx, filter).await?; // Process partition availability events for event in events.events { if let EventType::PartitionEvent(pe) = event.event_type { if pe.status_code == PartitionStatus::PartitionAvailable { // Trigger local build evaluation trigger_build_evaluation().await?; } } } upstream.last_sync_idx = events.next_idx; Ok(()) } ``` ## SLA Monitoring and TTL Management ### SLA Violations External monitoring systems query for SLA violations: ```sql -- Find SLA violations (for alerting) SELECT * FROM partition_want_events w WHERE w.sla_seconds IS NOT NULL AND (w.data_timestamp + (w.sla_seconds * 1000000000)) < ? -- now AND NOT EXISTS ( SELECT 1 FROM partition_events p WHERE p.partition_ref = w.partition_ref AND p.status_code = ? -- PartitionAvailable ) ``` ### TTL Expiration Wants with expired TTLs are excluded from build evaluation: ```sql -- Get active (non-expired) wants SELECT * FROM partition_want_events w WHERE (w.ttl_seconds IS NULL OR w.created_at + (w.ttl_seconds * 1000000000) > ?) -- now AND NOT EXISTS ( SELECT 1 FROM partition_events p WHERE p.partition_ref = w.partition_ref AND p.status_code = ? -- PartitionAvailable ) ``` ## Example Scenarios ### Scenario 1: Daily Analytics Pipeline ``` 1. 6:00 AM: Daily trigger creates want for analytics/daily/2024-01-01 - SLA: 9:00 AM (3 hours after data_timestamp of midnight) - TTL: 1 year (keep trying for historical data) - External deps: ["data/users/2024-01-01"] 2. 6:01 AM: Build evaluation runs, data/users/2024-01-01 missing - No build request generated 3. 8:30 AM: Upstream publishes data/users/2024-01-01 - Cross-graph sync detects availability - Build evaluation triggered - BuildRequest[analytics/daily/2024-01-01] succeeds 4. Result: Analytics available at 8:45 AM, within SLA ``` ### Scenario 2: Late Data with SLA Miss ``` 1. 6:00 AM: Want created for analytics/daily/2024-01-01 (SLA: 9:00 AM) 2. 9:30 AM: SLA monitoring detects violation, sends alert 3. 11:00 AM: Upstream data finally arrives 4. 11:01 AM: Build evaluation triggers, analytics built 5. Result: Late delivery logged, but data still processed ``` ### Scenario 3: Manual CLI Build ``` 1. User: databuild build data/transform/urgent 2. Want created with short TTL (30 min) and SLA (5 min) 3. Build evaluation: dependencies available, immediate build 4. Result: Fast feedback for interactive use ``` ## Benefits ### Unified Build Model - All builds (manual, scheduled, triggered) use same want mechanism - Complete audit trail in build event log - Consistent SLA tracking across all build types ### Event-Driven Efficiency - Builds only triggered when dependencies change - Cross-graph coordination via efficient event streaming - No polling for task readiness within builds ### Atomic Build Semantics - Individual build requests remain all-or-nothing - Fast failure provides immediate feedback - Partial progress via multiple build requests over time ### Flexible SLA Management - Separate business expectations (SLA) from operational limits (TTL) - External monitoring with clear blame assignment - Automatic cleanup of stale wants ### Cross-Graph Scalability - Reliable HTTP-based coordination (no message loss) - Efficient filtering via partition patterns - Decentralized architecture with clear boundaries ## Implementation Notes ### Build Event Log Integration - Wants are stored as events in the BEL for consistency - Same query interfaces used for wants and build coordination - Event-driven architecture throughout ### Service Integration - GraphService API exposed via HTTP for cross-graph coordination - Dashboard integration for manual want creation - External SLA monitoring via BEL queries ### CLI Integration - CLI commands create manual wants with appropriate TTLs - Immediate build evaluation for interactive feedback - Standard build request execution path