databuild/docs/narrative/why-not-dags.md at f388f4d86de4fbf658363eb62beaad2d8cd241ad

Stuart Axelbrooke f388f4d86d WIP I guess

2025-10-11 11:13:27 -07:00

Airflow and Luigi are OG data orchestrators that inspired databuild
Airflow uses explicit declaration of DAG structure
Luigi uses implicit, discovered DAG structure
Both use DAG runs as a top-level unit of execution
This is nice because you can see what's going to happen after the DAG run has launched
This is not nice because you have to deal with mid-execution DAG runs during deployments - what do you do?
- Do you terminate existing dag runs and retrigger? (what if the workload is stateful? Don't do that!)
- Do you let existing dag runs finish?
- How do you deal with DAG run identity under changing DAG definition?
These questions are all red herrings. We don't care about the DAG definition - we care about the data we want to produce.
We should instead declare what partitions we want, and iteratively propagate
Inter-job invariants suck (simplify)
- What about sense plan act? Rebuttal is "sense produces data"? How would launchpad under this work in a way that didn't suck?
- Is there a hot take to make about config? "customer X is targeting Y" is a reality of modern apps, bazel-esque config is
  - Should this be under #partition-identity or something?