databuild/plans/08-integration-test-v2.md

# Integration Test Plan for DataBuild Delegation System

## Overview
Create comprehensive integration tests for the basic_graph example that trigger delegation scenarios and verify Build Event Log (BEL) entries to ensure the delegation system works correctly and provides proper traceability.

## Current Test Infrastructure Analysis

**Existing Pattern**: The current test suite in `/tests/end_to_end/` follows a mature pattern:
- **Common utilities**: `lib/test_utils.sh`, `lib/db_utils.sh`, `lib/service_utils.sh`
- **Test isolation**: Separate SQLite databases per test to prevent interference
- **CLI vs Service validation**: Tests ensure both paths produce identical events
- **Event analysis**: Detailed breakdown of job/partition/request event counts
- **Robust service management**: Start/stop with proper cleanup and health checks

**Target System**: basic_graph example with two jobs:
- `generate_number_job`: Produces partitions like `generated_number/pippin`
- `sum_job`: Depends on multiple generated numbers, produces `sum/pippin_salem_sadie`

## New Test Implementation Plan

### 1. Create Delegation-Specific Test: `basic_graph_delegation_test.sh`

**Test Scenarios**:
- **Historical Delegation**: Run same partition twice, verify second run delegates to first
- **Multi-partition Jobs**: Test delegation behavior when jobs produce multiple partitions
- **Mixed Availability**: Test jobs where some target partitions exist, others don't
- **BEL Verification**: Validate specific delegation events and job status transitions

**Core Test Cases**:

1. **Single Partition Historical Delegation**
   - Build `generated_number/pippin` (first run - normal execution)
   - Build `generated_number/pippin` again (second run - should delegate)
   - Verify BEL contains: `DelegationEvent` + `JOB_SKIPPED` for second run

2. **Multi-Partition Delegation Scenarios**
   - Build `generated_number/pippin`, `generated_number/salem`, `generated_number/sadie`
   - Build `sum/pippin_salem_sadie` (should delegate to existing partitions)
   - Verify delegation events point to correct source build requests

3. **Partial Delegation Scenario**
   - Build `generated_number/pippin`, `generated_number/salem`
   - Request `generated_number/pippin`, `generated_number/salem`, `generated_number/sadie`
   - Verify: delegations for pippin/salem, normal execution for sadie

4. **Cross-Run Delegation Chain**
   - Run 1: Build `generated_number/pippin`
   - Run 2: Build `generated_number/salem`
   - Run 3: Build `sum/pippin_salem_sadie` (requires sadie, should delegate pippin/salem)
   - Verify delegation traceability to correct source builds

### 2. BEL Validation Utilities

**New functions in `lib/db_utils.sh`**:
- `get_delegation_events()`: Extract delegation events for specific partition
- `verify_job_skipped()`: Check job was properly skipped with delegation
- `get_delegation_source_build()`: Validate delegation points to correct build request
- `compare_delegation_behavior()`: Compare CLI vs Service delegation consistency

**Event Validation Logic**:
```bash
# For historical delegation, verify event sequence:
# 1. DelegationEvent(partition_ref, delegated_to_build_request_id, message)
# 2. JobEvent(status=JOB_SKIPPED, message="Job skipped - all target partitions already available")
# 3. No JobEvent(JOB_SCHEDULED/RUNNING/COMPLETED) for delegated job

# For successful delegation:
# - Success rate should be 100% (JOB_SKIPPED counts as success)
# - Partition should show as available without re-execution
# - Build request should complete successfully
```

### 3. Performance and Reliability Validation

**Delegation Efficiency Tests**:
- Time comparison: first run vs delegated run (should be significantly faster)
- Resource usage: ensure delegated runs don't spawn job processes
- Concurrency: multiple builds requesting same partition simultaneously

**Error Scenarios**:
- Source build request failure handling
- Corrupted delegation data
- Stale partition detection

### 4. Integration with Existing Test Suite

**File Structure**:
```
tests/end_to_end/
├── basic_graph_delegation_test.sh    # New delegation-specific tests
├── basic_graph_test.sh               # Existing functionality tests (enhanced)
├── lib/
│   ├── delegation_utils.sh           # New delegation validation utilities
│   ├── db_utils.sh                   # Enhanced with delegation functions
│   └── test_utils.sh                 # Existing utilities
└── BUILD                             # Updated to include new test
```

**Bazel Integration**:
- Add `basic_graph_delegation_test` as new `sh_test` target
- Include in `run_e2e_tests.sh` execution
- Tag with `["delegation", "e2e"]` for selective running

### 5. CLI vs Service Delegation Consistency

**Validation Approach**:
- Run identical delegation scenarios through both CLI and Service
- Compare BEL entries for identical delegation behavior
- Ensure both paths produce same success rates and event counts
- Validate API responses include delegation information

### 6. Documentation and Debugging Support

**Test Output Enhancement**:
- Clear delegation event logging during test execution
- Detailed failure diagnostics showing expected vs actual delegation behavior
- BEL dump utilities for debugging delegation issues
- Performance metrics (execution time, event counts)

## Expected Outcomes

**Success Criteria**:
1. **100% Success Rate**: Delegated builds show 100% success rate in dashboard
2. **Event Consistency**: CLI and Service produce identical delegation events
3. **Traceability**: All delegations link to correct source build requests
4. **Performance**: Delegated runs complete in <5 seconds vs 30+ seconds for full execution
5. **Multi-partition Correctness**: Complex jobs with mixed partition availability handled properly

**Regression Prevention**:
- Automated validation prevents delegation system regressions
- Comprehensive BEL verification ensures audit trail integrity
- Performance benchmarks detect delegation efficiency degradation

## Implementation Priority

1. **High**: Core delegation test cases (historical, multi-partition)
2. **High**: BEL validation utilities and event verification
3. **Medium**: Performance benchmarking and efficiency validation
4. **Medium**: Error scenario testing and edge cases
5. **Low**: Advanced concurrency and stress testing

This plan provides a comprehensive testing strategy that validates both the functional correctness and performance benefits of the delegation system while ensuring long-term reliability and debuggability.

## Implementation Notes

This plan was created following the user's request to improve system reliability and testability for the DataBuild delegation system. The focus is on the basic_graph example because it provides a simpler, more predictable test environment compared to the podcast_reviews example, while still covering all the essential delegation scenarios.

The delegation system currently shows some issues (67% success rate instead of 100%) that these tests should help identify and prevent regression of once fixed. The comprehensive BEL validation will ensure that the delegation events provide proper audit trails and traceability as intended by the system design.