# Simple Python DSL Example This example demonstrates how to use DataBuild's Python DSL to define a simple data processing pipeline. ## Overview The example defines a basic 3-stage data processing pipeline: 1. **IngestRawData**: Ingests raw data for a specific date 2. **ProcessData**: Processes the raw data into a processed format 3. **CreateSummary**: Creates summary statistics from processed data ## Files - `simple_graph.py`: Python DSL definition of the data pipeline - `BUILD.bazel`: Bazel build configuration - `MODULE.bazel`: Bazel module configuration for dependencies ## Usage ### Generate DSL Targets The DSL generator can create Bazel targets from the Python DSL definition: ```bash bazel run //:simple_graph.generate ``` This will generate Bazel targets in the `generated/` directory. ### Build Individual Jobs ```bash # Build a specific job bazel build //:ingest_raw_data # Build all jobs bazel build //:simple_graph ``` ### Analyze the Graph ```bash # Analyze what jobs would run for specific partitions bazel run //:simple_graph.analyze -- "summary/date=2024-01-01" ``` ### Run the Graph ```bash # Build specific partitions bazel run //:simple_graph.build -- "summary/date=2024-01-01" ``` ## Cross-Workspace Usage This example can be consumed from external workspaces by adding DataBuild as a dependency in your `MODULE.bazel`: ```starlark bazel_dep(name = "databuild", version = "0.0") local_path_override( module_name = "databuild", path = "path/to/databuild", ) ``` Then you can reference and extend this example: ```python from databuild.dsl.python.dsl import DataBuildGraph # Import and extend the simple graph ``` ## Testing To test that the DSL generator works correctly: ```bash # Test the DSL generation bazel run //:simple_graph.generate # Verify generated files exist ls generated/ # Test job lookup bazel run //:job_lookup -- "raw_data/date=2024-01-01" ```