databuild/tests/end_to_end/README.md

# DataBuild End-to-End Tests

This directory contains comprehensive end-to-end tests for DataBuild that validate CLI and Service build consistency across different graph examples.

## Quick Start

To run all end-to-end tests:

```bash
# From the root of the databuild repository
./run_e2e_tests.sh
```

To run just the Bazel-integrated validation test:

```bash
bazel test //tests/end_to_end:e2e_runner_test
```

To run all tests (including core DataBuild tests):

```bash
bazel test //...
```

## Test Coverage

### Basic Graph Tests
- **Single Partition Build**: CLI vs Service for `generated_number/pippin`
- **Multiple Partition Build**: CLI vs Service for multiple partitions
- **Sum Partition Build**: Tests dependency resolution with `sum/pippin_salem_sadie`
- **Event Validation**: Compares build events between CLI and Service

### Podcast Reviews Tests
- **Simple Pipeline**: CLI build for `reviews/date=2020-01-01`
- **Complex Pipeline**: Multi-stage data pipeline validation
- **Directory Dependencies**: Tests jobs that require specific working directories

### Validation Tests
- **Build Event Logging**: Verifies SQLite database creation and event storage
- **Service API**: Tests HTTP API endpoints and responses
- **Consistency**: Ensures CLI and Service produce similar results

## Test Architecture

```
tests/end_to_end/
├── README.md                    # This file
├── BUILD                        # Bazel test targets
├── validate_runner.sh           # Simple validation test
├── simple_test.sh              # Working basic test
├── basic_graph_test.sh         # Comprehensive basic graph tests
├── podcast_reviews_test.sh     # Comprehensive podcast reviews tests
└── lib/
    ├── test_utils.sh           # Common test utilities
    ├── db_utils.sh             # Database comparison utilities
    └── service_utils.sh        # Service management utilities
```

## Key Findings

1. **Partition Format**: Basic graph uses `generated_number/pippin` format, not just `pippin`
2. **Service Configuration**: Services use hardcoded database paths in their wrapper scripts
3. **API Response Format**: Service returns `build_request_id` and lowercase status values
4. **Working Directory**: Podcast reviews jobs must run from their package directory

## Test Results

The tests demonstrate successful end-to-end functionality:

- ✅ **CLI Build**: Generates proper build events (10 events for basic graph)
- ✅ **Service Build**: Responds correctly to HTTP API requests (14 events for basic graph)
- ✅ **Event Consistency**: Both approaches generate expected events
- ✅ **Complex Pipelines**: Podcast reviews pipeline executes successfully
- ✅ **Database Isolation**: Separate databases prevent test interference

## Manual Testing

You can also run individual tests manually:

```bash
# Test basic graph
cd examples/basic_graph
bazel build //:basic_graph.build //:basic_graph.service
../../tests/end_to_end/simple_test.sh \
  bazel-bin/basic_graph.build \
  bazel-bin/basic_graph.service

# Test podcast reviews CLI
cd examples/podcast_reviews
bazel build //:podcast_reviews_graph.build
export DATABUILD_BUILD_EVENT_LOG="sqlite:///tmp/test.db"
bazel-bin/podcast_reviews_graph.build "reviews/date=2020-01-01"
```

## Integration with CI/CD

The tests are designed to integrate with CI/CD systems:

- **Bazel Integration**: `bazel test //...` runs validation tests
- **Shell Script**: `./run_e2e_tests.sh` provides standalone execution
- **Exit Codes**: Proper exit codes for automation
- **Cleanup**: Automatic cleanup of test processes and files