databuild/docs/narrative/why-not-dags.md
Stuart Axelbrooke ea83610d35
Some checks failed
/ setup (push) Has been cancelled
A lot of refactoring
2025-09-27 15:29:22 -07:00

846 B

  • Airflow and Luigi are OG data orchestrators that inspired databuild
  • Airflow uses explicit declaration of DAG structure
  • Luigi uses implicit, discovered DAG structure
  • Both use DAG runs as a top-level unit of execution
  • This is nice because you can see what's going to happen after the DAG run has launched
  • This is not nice because you have to deal with mid-execution DAG runs during deployments - what do you do?
    • Do you terminate existing dag runs and retrigger? (what if the workload is stateful? Don't do that!)
    • Do you let existing dag runs finish?
    • How do you deal with DAG run identity under changing DAG definition?
  • These questions are all red herrings. We don't care about the DAG definition - we care about the data we want to produce.
  • We should instead declare what partitions we want, and iteratively propagate