diff --git a/docs/ideas/2025-11-30_wants-for-data-retention.md b/docs/ideas/2025-11-30_wants-for-data-retention.md new file mode 100644 index 0000000..1ad5058 --- /dev/null +++ b/docs/ideas/2025-11-30_wants-for-data-retention.md @@ -0,0 +1,12 @@ + +- You can't keep building partitions forever + - Space runs out for literal data, costs increase as data value decreases for old data + - Memory runs out for indexes over partitions, e.g. in databuild itself +- We could introduce vacuuming? +- We could introduce partition want expiry callbacks? +- Jobs and job runs as partition edge transitions? +- Do we even want to delete partition entries? Can just wait till this is a problem. +- Partition want expiry events also enable non-event level reaction, e.g. vacuum for all events between time T1 and T2. +- RISK! If partition to partition data deps (via jobs that change, etc) are not canonical/stable: + - We cannot assert the necessity of upstream partitions for anything longer than the initial job time to success (because it may have changed) + - To make this valuable, we need to be able to assume that the reads/data deps from a singular parameterized job run are durable, because then we can propagate want times and have a durable "why does this partition need to exist"