Update plan
This commit is contained in:
parent
cdc47bddfe
commit
492c30c0bc
1 changed files with 175 additions and 1 deletions
|
|
@ -141,7 +141,181 @@ bazel run @test_graph//:job_lookup -- \
|
|||
|
||||
---
|
||||
|
||||
### Phase 3: Graph Integration
|
||||
### Phase 3: Two-Phase Code Generation
|
||||
**Goal**: Implement proper two-phase code generation that works within Bazel's constraints
|
||||
|
||||
#### Key Learning
|
||||
Previous attempts failed due to fundamental Bazel constraints:
|
||||
- **Loading vs Execution phases**: `load()` statements run before genrules execute
|
||||
- **Dynamic target generation**: Bazel requires the complete build graph before execution begins
|
||||
- **Hermeticity**: Generated BUILD files must be in source tree, not bazel-bin
|
||||
|
||||
The solution: **Two-phase generation** following established patterns from protobuf, thrift, and other code generators.
|
||||
|
||||
#### Two-Phase Workflow
|
||||
|
||||
**Phase 1: Code Generation** (run by developer)
|
||||
```bash
|
||||
bazel run //databuild/test/app/dsl:graph.generate
|
||||
# Generates BUILD.bazel and Python binaries into source tree
|
||||
```
|
||||
|
||||
**Phase 2: Building** (normal Bazel workflow)
|
||||
```bash
|
||||
bazel build //databuild/test/app/dsl:graph.analyze
|
||||
bazel run //databuild/test/app/dsl:graph.service -- --port 8080
|
||||
```
|
||||
|
||||
#### Implementation Tasks
|
||||
|
||||
1. **Create `databuild_dsl_generator` rule**:
|
||||
```python
|
||||
databuild_dsl_generator(
|
||||
name = "graph.generate",
|
||||
graph_file = "graph.py",
|
||||
output_package = "//databuild/test/app/dsl",
|
||||
deps = [":dsl_src"],
|
||||
)
|
||||
```
|
||||
|
||||
2. **Implement generator that writes to source tree**:
|
||||
```python
|
||||
def _databuild_dsl_generator_impl(ctx):
|
||||
script = ctx.actions.declare_file(ctx.label.name + "_generator.py")
|
||||
|
||||
# Create a script that:
|
||||
# 1. Loads the DSL graph
|
||||
# 2. Generates BUILD.bazel and binaries
|
||||
# 3. Writes them to the source tree
|
||||
script_content = """
|
||||
import os
|
||||
import sys
|
||||
# Add workspace root to path
|
||||
workspace_root = os.environ.get('BUILD_WORKSPACE_DIRECTORY')
|
||||
output_dir = os.path.join(workspace_root, '{package_path}')
|
||||
|
||||
# Load and generate
|
||||
from {module_path} import {graph_attr}
|
||||
{graph_attr}.generate_bazel_package('{name}', output_dir)
|
||||
print(f'Generated BUILD.bazel and binaries in {{output_dir}}')
|
||||
""".format(
|
||||
package_path = ctx.attr.output_package.strip("//").replace(":", "/"),
|
||||
module_path = ctx.file.graph_file.path.replace("/", ".").replace(".py", ""),
|
||||
graph_attr = ctx.attr.graph_attr,
|
||||
name = ctx.attr.name.replace(".generate", ""),
|
||||
)
|
||||
|
||||
ctx.actions.write(
|
||||
output = script,
|
||||
content = script_content,
|
||||
is_executable = True,
|
||||
)
|
||||
|
||||
return [DefaultInfo(executable = script)]
|
||||
```
|
||||
|
||||
3. **Update `DataBuildGraph.generate_bazel_package()` to target source tree**:
|
||||
```python
|
||||
def generate_bazel_package(self, name: str, output_dir: str) -> None:
|
||||
"""Generate BUILD.bazel and binaries into source directory"""
|
||||
# Generate BUILD.bazel with real databuild targets
|
||||
self._generate_build_bazel(output_dir, name)
|
||||
|
||||
# Generate job binaries
|
||||
self._generate_job_binaries(output_dir)
|
||||
|
||||
# Generate job lookup
|
||||
self._generate_job_lookup(output_dir)
|
||||
|
||||
print(f"Generated package in {output_dir}")
|
||||
print("Run 'bazel build :{name}.analyze' to use")
|
||||
```
|
||||
|
||||
4. **Create standard BUILD.bazel template**:
|
||||
```python
|
||||
def _generate_build_bazel(self, output_dir: str, name: str):
|
||||
# Generate proper databuild_job and databuild_graph targets
|
||||
# that will work exactly like hand-written ones
|
||||
build_content = self._build_template.format(
|
||||
jobs = self._format_jobs(),
|
||||
graph_name = f"{name}_graph",
|
||||
job_targets = self._format_job_targets(),
|
||||
)
|
||||
|
||||
with open(os.path.join(output_dir, "BUILD.bazel"), "w") as f:
|
||||
f.write(build_content)
|
||||
```
|
||||
|
||||
#### Interface Design
|
||||
|
||||
**For DSL Authors**:
|
||||
```python
|
||||
# In graph.py
|
||||
graph = DataBuildGraph("my_graph")
|
||||
|
||||
@graph.job
|
||||
class MyJob(DataBuildJob):
|
||||
# ... job definition
|
||||
```
|
||||
|
||||
**For Users**:
|
||||
```bash
|
||||
# Generate code (phase 1)
|
||||
bazel run //my/app:graph.generate
|
||||
|
||||
# Use generated code (phase 2)
|
||||
bazel build //my/app:graph.analyze
|
||||
bazel run //my/app:graph.service
|
||||
```
|
||||
|
||||
**In BUILD.bazel**:
|
||||
```python
|
||||
databuild_dsl_generator(
|
||||
name = "graph.generate",
|
||||
graph_file = "graph.py",
|
||||
output_package = "//my/app",
|
||||
deps = [":my_deps"],
|
||||
)
|
||||
|
||||
# After generation, this file will contain:
|
||||
# databuild_graph(name = "graph_graph", ...)
|
||||
# databuild_job(name = "my_job", ...)
|
||||
# py_binary(name = "my_job_binary", ...)
|
||||
```
|
||||
|
||||
#### Benefits of This Approach
|
||||
|
||||
✅ **Works within Bazel constraints** - No dynamic target generation
|
||||
✅ **Follows established patterns** - Same as protobuf, thrift, OpenAPI generators
|
||||
✅ **Inspectable output** - Users can see generated BUILD.bazel
|
||||
✅ **Version controllable** - Generated files can be checked in if desired
|
||||
✅ **Incremental builds** - Standard Bazel caching works perfectly
|
||||
✅ **Clean separation** - Generation vs building are separate phases
|
||||
|
||||
#### Tests & Verification
|
||||
```bash
|
||||
# Test: Code generation
|
||||
bazel run //databuild/test/app/dsl:graph.generate
|
||||
# Should create BUILD.bazel and Python files in source tree
|
||||
|
||||
# Test: Generated targets work
|
||||
bazel build //databuild/test/app/dsl:graph_graph.analyze
|
||||
# Should build successfully using generated BUILD.bazel
|
||||
|
||||
# Test: End-to-end functionality
|
||||
bazel run //databuild/test/app/dsl:graph_graph.analyze -- "color_vote_report/2024-01-01/red"
|
||||
# Should work exactly like hand-written graph
|
||||
```
|
||||
|
||||
#### Success Criteria
|
||||
- Generator creates valid BUILD.bazel in source tree
|
||||
- Generated targets are indistinguishable from hand-written ones
|
||||
- Full DataBuild functionality works through generated code
|
||||
- Clean developer workflow with clear phase separation
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Graph Integration
|
||||
**Goal**: Generate complete databuild graph targets with all operational variants
|
||||
|
||||
#### Deliverables
|
||||
|
|
|
|||
Loading…
Reference in a new issue