Simple Python DSL Example

This example demonstrates how to use DataBuild's Python DSL to define a simple data processing pipeline.

Overview

The example defines a basic 3-stage data processing pipeline:

IngestRawData: Ingests raw data for a specific date
ProcessData: Processes the raw data into a processed format
CreateSummary: Creates summary statistics from processed data

Files

simple_graph.py: Python DSL definition of the data pipeline
BUILD.bazel: Bazel build configuration
MODULE.bazel: Bazel module configuration for dependencies

Usage

Generate DSL Targets

The DSL generator can create Bazel targets from the Python DSL definition:

bazel run //:simple_graph.generate

This will generate Bazel targets in the generated/ directory.

Build Individual Jobs

# Build a specific job
bazel build //:ingest_raw_data

# Build all jobs
bazel build //:simple_graph

Analyze the Graph

# Analyze what jobs would run for specific partitions
bazel run //:simple_graph.analyze -- "summary/date=2024-01-01"

Run the Graph

# Build specific partitions
bazel run //:simple_graph.build -- "summary/date=2024-01-01"

Cross-Workspace Usage

This example can be consumed from external workspaces by adding DataBuild as a dependency in your MODULE.bazel:

bazel_dep(name = "databuild", version = "0.0")
local_path_override(
    module_name = "databuild", 
    path = "path/to/databuild",
)

Then you can reference and extend this example:

from databuild.dsl.python.dsl import DataBuildGraph
# Import and extend the simple graph

Testing

To test that the DSL generator works correctly:

# Test the DSL generation
bazel run //:simple_graph.generate

# Verify generated files exist
ls generated/

# Test job lookup
bazel run //:job_lookup -- "raw_data/date=2024-01-01"

1.9 KiB Raw Permalink Blame History