# DataBuild End-to-End Tests This directory contains comprehensive end-to-end tests for DataBuild that validate CLI and Service build consistency across different graph examples. ## Quick Start To run all end-to-end tests: ```bash # From the root of the databuild repository ./run_e2e_tests.sh ``` To run just the Bazel-integrated validation test: ```bash bazel test //tests/end_to_end:e2e_runner_test ``` To run all tests (including core DataBuild tests): ```bash bazel test //... ``` ## Test Coverage ### Basic Graph Tests - **Single Partition Build**: CLI vs Service for `generated_number/pippin` - **Multiple Partition Build**: CLI vs Service for multiple partitions - **Sum Partition Build**: Tests dependency resolution with `sum/pippin_salem_sadie` - **Event Validation**: Compares build events between CLI and Service ### Podcast Reviews Tests - **Simple Pipeline**: CLI build for `reviews/date=2020-01-01` - **Complex Pipeline**: Multi-stage data pipeline validation - **Directory Dependencies**: Tests jobs that require specific working directories ### Validation Tests - **Build Event Logging**: Verifies SQLite database creation and event storage - **Service API**: Tests HTTP API endpoints and responses - **Consistency**: Ensures CLI and Service produce similar results ## Test Architecture ``` tests/end_to_end/ ├── README.md # This file ├── BUILD # Bazel test targets ├── validate_runner.sh # Simple validation test ├── simple_test.sh # Working basic test ├── basic_graph_test.sh # Comprehensive basic graph tests ├── podcast_reviews_test.sh # Comprehensive podcast reviews tests └── lib/ ├── test_utils.sh # Common test utilities ├── db_utils.sh # Database comparison utilities └── service_utils.sh # Service management utilities ``` ## Key Findings 1. **Partition Format**: Basic graph uses `generated_number/pippin` format, not just `pippin` 2. **Service Configuration**: Services use hardcoded database paths in their wrapper scripts 3. **API Response Format**: Service returns `build_request_id` and lowercase status values 4. **Working Directory**: Podcast reviews jobs must run from their package directory ## Test Results The tests demonstrate successful end-to-end functionality: - ✅ **CLI Build**: Generates proper build events (10 events for basic graph) - ✅ **Service Build**: Responds correctly to HTTP API requests (14 events for basic graph) - ✅ **Event Consistency**: Both approaches generate expected events - ✅ **Complex Pipelines**: Podcast reviews pipeline executes successfully - ✅ **Database Isolation**: Separate databases prevent test interference ## Manual Testing You can also run individual tests manually: ```bash # Test basic graph cd examples/basic_graph bazel build //:basic_graph.build //:basic_graph.service ../../tests/end_to_end/simple_test.sh \ bazel-bin/basic_graph.build \ bazel-bin/basic_graph.service # Test podcast reviews CLI cd examples/podcast_reviews bazel build //:podcast_reviews_graph.build export DATABUILD_BUILD_EVENT_LOG="sqlite:///tmp/test.db" bazel-bin/podcast_reviews_graph.build "reviews/date=2020-01-01" ``` ## Integration with CI/CD The tests are designed to integrate with CI/CD systems: - **Bazel Integration**: `bazel test //...` runs validation tests - **Shell Script**: `./run_e2e_tests.sh` provides standalone execution - **Exit Codes**: Proper exit codes for automation - **Cleanup**: Automatic cleanup of test processes and files