databuild/examples/simple_python_dsl/README.md
Stuart Axelbrooke ba18734190
Some checks failed
/ setup (push) Has been cancelled
Make dsl generation work for submodules
2025-08-06 22:16:01 -07:00

87 lines
No EOL
1.9 KiB
Markdown

# Simple Python DSL Example
This example demonstrates how to use DataBuild's Python DSL to define a simple data processing pipeline.
## Overview
The example defines a basic 3-stage data processing pipeline:
1. **IngestRawData**: Ingests raw data for a specific date
2. **ProcessData**: Processes the raw data into a processed format
3. **CreateSummary**: Creates summary statistics from processed data
## Files
- `simple_graph.py`: Python DSL definition of the data pipeline
- `BUILD.bazel`: Bazel build configuration
- `MODULE.bazel`: Bazel module configuration for dependencies
## Usage
### Generate DSL Targets
The DSL generator can create Bazel targets from the Python DSL definition:
```bash
bazel run //:simple_graph.generate
```
This will generate Bazel targets in the `generated/` directory.
### Build Individual Jobs
```bash
# Build a specific job
bazel build //:ingest_raw_data
# Build all jobs
bazel build //:simple_graph
```
### Analyze the Graph
```bash
# Analyze what jobs would run for specific partitions
bazel run //:simple_graph.analyze -- "summary/date=2024-01-01"
```
### Run the Graph
```bash
# Build specific partitions
bazel run //:simple_graph.build -- "summary/date=2024-01-01"
```
## Cross-Workspace Usage
This example can be consumed from external workspaces by adding DataBuild as a dependency in your `MODULE.bazel`:
```starlark
bazel_dep(name = "databuild", version = "0.0")
local_path_override(
module_name = "databuild",
path = "path/to/databuild",
)
```
Then you can reference and extend this example:
```python
from databuild.dsl.python.dsl import DataBuildGraph
# Import and extend the simple graph
```
## Testing
To test that the DSL generator works correctly:
```bash
# Test the DSL generation
bazel run //:simple_graph.generate
# Verify generated files exist
ls generated/
# Test job lookup
bazel run //:job_lookup -- "raw_data/date=2024-01-01"
```