add thoughts on wants for data retention
This commit is contained in:
parent
17d5987517
commit
9c6cb11713
1 changed files with 12 additions and 0 deletions
12
docs/ideas/2025-11-30_wants-for-data-retention.md
Normal file
12
docs/ideas/2025-11-30_wants-for-data-retention.md
Normal file
|
|
@ -0,0 +1,12 @@
|
||||||
|
|
||||||
|
- You can't keep building partitions forever
|
||||||
|
- Space runs out for literal data, costs increase as data value decreases for old data
|
||||||
|
- Memory runs out for indexes over partitions, e.g. in databuild itself
|
||||||
|
- We could introduce vacuuming?
|
||||||
|
- We could introduce partition want expiry callbacks?
|
||||||
|
- Jobs and job runs as partition edge transitions?
|
||||||
|
- Do we even want to delete partition entries? Can just wait till this is a problem.
|
||||||
|
- Partition want expiry events also enable non-event level reaction, e.g. vacuum for all events between time T1 and T2.
|
||||||
|
- RISK! If partition to partition data deps (via jobs that change, etc) are not canonical/stable:
|
||||||
|
- We cannot assert the necessity of upstream partitions for anything longer than the initial job time to success (because it may have changed)
|
||||||
|
- To make this valuable, we need to be able to assume that the reads/data deps from a singular parameterized job run are durable, because then we can propagate want times and have a durable "why does this partition need to exist"
|
||||||
Loading…
Reference in a new issue