# Simple Python DSL Example

This example demonstrates how to use DataBuild's Python DSL to define a simple data processing pipeline.

## Overview

The example defines a basic 3-stage data processing pipeline:

1. **IngestRawData**: Ingests raw data for a specific date
2. **ProcessData**: Processes the raw data into a processed format  
3. **CreateSummary**: Creates summary statistics from processed data

## Files

- `simple_graph.py`: Python DSL definition of the data pipeline
- `BUILD.bazel`: Bazel build configuration
- `MODULE.bazel`: Bazel module configuration for dependencies

## Usage

### Generate DSL Targets

The DSL generator can create Bazel targets from the Python DSL definition:

```bash
bazel run //:simple_graph.generate
```

This will generate Bazel targets in the `generated/` directory.

### Build Individual Jobs

```bash
# Build a specific job
bazel build //:ingest_raw_data

# Build all jobs
bazel build //:simple_graph
```

### Analyze the Graph

```bash
# Analyze what jobs would run for specific partitions
bazel run //:simple_graph.analyze -- "summary/date=2024-01-01"
```

### Run the Graph

```bash
# Build specific partitions
bazel run //:simple_graph.build -- "summary/date=2024-01-01"
```

## Cross-Workspace Usage

This example can be consumed from external workspaces by adding DataBuild as a dependency in your `MODULE.bazel`:

```starlark
bazel_dep(name = "databuild", version = "0.0")
local_path_override(
    module_name = "databuild", 
    path = "path/to/databuild",
)
```

Then you can reference and extend this example:

```python
from databuild.dsl.python.dsl import DataBuildGraph
# Import and extend the simple graph
```

## Testing

To test that the DSL generator works correctly:

```bash
# Test the DSL generation
bazel run //:simple_graph.generate

# Verify generated files exist
ls generated/

# Test job lookup
bazel run //:job_lookup -- "raw_data/date=2024-01-01"
```