18 lines
1.2 KiB
Markdown
18 lines
1.2 KiB
Markdown
|
|
- Airflow and Luigi are OG data orchestrators that inspired databuild
|
|
- Airflow uses explicit declaration of DAG structure
|
|
- Luigi uses implicit, discovered DAG structure
|
|
- Both use DAG runs as a top-level unit of execution
|
|
- This is nice because you can see what's going to happen after the DAG run has launched
|
|
- This is not nice because you have to deal with mid-execution DAG runs during deployments - what do you do?
|
|
- Do you terminate existing dag runs and retrigger? (what if the workload is stateful? Don't do that!)
|
|
- Do you let existing dag runs finish?
|
|
- How do you deal with DAG run identity under changing DAG definition?
|
|
- These questions are all red herrings. We don't care about the DAG definition - we care about the data we want to produce.
|
|
- We should instead declare what partitions we want, and iteratively propagate
|
|
|
|
- Inter-job invariants suck (simplify)
|
|
- What about sense plan act? Rebuttal is "sense produces data"? How would launchpad under this work in a way that didn't suck?
|
|
- Is there a hot take to make about config? "customer X is targeting Y" is a reality of modern apps, bazel-esque config is
|
|
- Should this be under `#partition-identity` or something?
|
|
|