1.9 KiB
1.9 KiB
Simple Python DSL Example
This example demonstrates how to use DataBuild's Python DSL to define a simple data processing pipeline.
Overview
The example defines a basic 3-stage data processing pipeline:
- IngestRawData: Ingests raw data for a specific date
- ProcessData: Processes the raw data into a processed format
- CreateSummary: Creates summary statistics from processed data
Files
simple_graph.py: Python DSL definition of the data pipelineBUILD.bazel: Bazel build configurationMODULE.bazel: Bazel module configuration for dependencies
Usage
Generate DSL Targets
The DSL generator can create Bazel targets from the Python DSL definition:
bazel run //:simple_graph.generate
This will generate Bazel targets in the generated/ directory.
Build Individual Jobs
# Build a specific job
bazel build //:ingest_raw_data
# Build all jobs
bazel build //:simple_graph
Analyze the Graph
# Analyze what jobs would run for specific partitions
bazel run //:simple_graph.analyze -- "summary/date=2024-01-01"
Run the Graph
# Build specific partitions
bazel run //:simple_graph.build -- "summary/date=2024-01-01"
Cross-Workspace Usage
This example can be consumed from external workspaces by adding DataBuild as a dependency in your MODULE.bazel:
bazel_dep(name = "databuild", version = "0.0")
local_path_override(
module_name = "databuild",
path = "path/to/databuild",
)
Then you can reference and extend this example:
from databuild.dsl.python.dsl import DataBuildGraph
# Import and extend the simple graph
Testing
To test that the DSL generator works correctly:
# Test the DSL generation
bazel run //:simple_graph.generate
# Verify generated files exist
ls generated/
# Test job lookup
bazel run //:job_lookup -- "raw_data/date=2024-01-01"