From 9c6cb11713742d1e316fabcf440d705d9ca1972e Mon Sep 17 00:00:00 2001
From: Stuart Axelbrooke <stuart@axelbrooke.com>
Date: Sun, 30 Nov 2025 16:05:18 +0800
Subject: [PATCH] add thoughts on wants for data retention

---
 docs/ideas/2025-11-30_wants-for-data-retention.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 docs/ideas/2025-11-30_wants-for-data-retention.md

diff --git a/docs/ideas/2025-11-30_wants-for-data-retention.md b/docs/ideas/2025-11-30_wants-for-data-retention.md
new file mode 100644
index 0000000..1ad5058
--- /dev/null
+++ b/docs/ideas/2025-11-30_wants-for-data-retention.md
@@ -0,0 +1,12 @@
+
+- You can't keep building partitions forever
+  - Space runs out for literal data, costs increase as data value decreases for old data
+  - Memory runs out for indexes over partitions, e.g. in databuild itself
+- We could introduce vacuuming?
+- We could introduce partition want expiry callbacks?
+- Jobs and job runs as partition edge transitions?
+- Do we even want to delete partition entries? Can just wait till this is a problem.
+- Partition want expiry events also enable non-event level reaction, e.g. vacuum for all events between time T1 and T2.
+- RISK! If partition to partition data deps (via jobs that change, etc) are not canonical/stable:
+  - We cannot assert the necessity of upstream partitions for anything longer than the initial job time to success (because it may have changed)
+  - To make this valuable, we need to be able to assume that the reads/data deps from a singular parameterized job run are durable, because then we can propagate want times and have a durable "why does this partition need to exist"