databuild/tests/end_to_end/README.md
2025-07-07 19:20:45 -07:00

103 lines
No EOL
3.5 KiB
Markdown

# DataBuild End-to-End Tests
This directory contains comprehensive end-to-end tests for DataBuild that validate CLI and Service build consistency across different graph examples.
## Quick Start
To run all end-to-end tests:
```bash
# From the root of the databuild repository
./run_e2e_tests.sh
```
To run just the Bazel-integrated validation test:
```bash
bazel test //tests/end_to_end:e2e_runner_test
```
To run all tests (including core DataBuild tests):
```bash
bazel test //...
```
## Test Coverage
### Basic Graph Tests
- **Single Partition Build**: CLI vs Service for `generated_number/pippin`
- **Multiple Partition Build**: CLI vs Service for multiple partitions
- **Sum Partition Build**: Tests dependency resolution with `sum/pippin_salem_sadie`
- **Event Validation**: Compares build events between CLI and Service
### Podcast Reviews Tests
- **Simple Pipeline**: CLI build for `reviews/date=2020-01-01`
- **Complex Pipeline**: Multi-stage data pipeline validation
- **Directory Dependencies**: Tests jobs that require specific working directories
### Validation Tests
- **Build Event Logging**: Verifies SQLite database creation and event storage
- **Service API**: Tests HTTP API endpoints and responses
- **Consistency**: Ensures CLI and Service produce similar results
## Test Architecture
```
tests/end_to_end/
├── README.md # This file
├── BUILD # Bazel test targets
├── validate_runner.sh # Simple validation test
├── simple_test.sh # Working basic test
├── basic_graph_test.sh # Comprehensive basic graph tests
├── podcast_reviews_test.sh # Comprehensive podcast reviews tests
└── lib/
├── test_utils.sh # Common test utilities
├── db_utils.sh # Database comparison utilities
└── service_utils.sh # Service management utilities
```
## Key Findings
1. **Partition Format**: Basic graph uses `generated_number/pippin` format, not just `pippin`
2. **Service Configuration**: Services use hardcoded database paths in their wrapper scripts
3. **API Response Format**: Service returns `build_request_id` and lowercase status values
4. **Working Directory**: Podcast reviews jobs must run from their package directory
## Test Results
The tests demonstrate successful end-to-end functionality:
-**CLI Build**: Generates proper build events (10 events for basic graph)
-**Service Build**: Responds correctly to HTTP API requests (14 events for basic graph)
-**Event Consistency**: Both approaches generate expected events
-**Complex Pipelines**: Podcast reviews pipeline executes successfully
-**Database Isolation**: Separate databases prevent test interference
## Manual Testing
You can also run individual tests manually:
```bash
# Test basic graph
cd examples/basic_graph
bazel build //:basic_graph.build //:basic_graph.service
../../tests/end_to_end/simple_test.sh \
bazel-bin/basic_graph.build \
bazel-bin/basic_graph.service
# Test podcast reviews CLI
cd examples/podcast_reviews
bazel build //:podcast_reviews_graph.build
export DATABUILD_BUILD_EVENT_LOG="sqlite:///tmp/test.db"
bazel-bin/podcast_reviews_graph.build "reviews/date=2020-01-01"
```
## Integration with CI/CD
The tests are designed to integrate with CI/CD systems:
- **Bazel Integration**: `bazel test //...` runs validation tests
- **Shell Script**: `./run_e2e_tests.sh` provides standalone execution
- **Exit Codes**: Proper exit codes for automation
- **Cleanup**: Automatic cleanup of test processes and files