# Triggers Purpose: to enable simple but powerful declarative specification of what data should be built. ## Correctness Strategy - Wants + TTLs - ...? ## Wants Wants cause graphs to try to build the wanted partitions until a) the partitions are live or b) the TTL runs out. Wants can trigger a callback on TTL expiry, enabling SLA-like behavior. Wants are recorded in the [BEL](./build-event-log.md), so they can be queried and viewed in the web app, linking to build requests triggered by a given want, enabling answering of the "why doesn't this partition exist yet?" question. ### Unwants You can also unwant partitions, which overrides all wants of those partitions prior to the unwant timestamp. This is primarily to enable the "data source is now disabled" style feature practically necessary in many data platforms. ### Virtual Partitions & External Data Essentially all data teams consume some external data source, and late arriving data is the rule more than the exception. Virtual partitions are a way to model external data that is not produced by a graph. For all intents and purposes, these are standard partitions, the only difference is that the job that "produces" them doesn't actually do any ETL, it just assesses external data sufficiency and emits a "partition live" event when its ready to be consumed. ## Triggers ## Taints - Mechanism for invalidating existing partitions (e.g. we know bad data went into this, need to stop consumers from using it) --- - Purpose - Every useful data application has triggering to ensure data is built on schedule - Philosophy - Opinionated strategy plus escape hatches - Taints - Two strategies - Basic: cron triggered scripts that return partitions - Bazel: target with `cron`, `executable` fields, optional `partition_patterns` field to constrain - Declarative: want-based, wants cause build requests to be continually retried until the wanted partitions are live, or running a `want_failed` script if it times out (e.g. SLA breach) - +want and -want - +want declares want for 1+ partitions with a timeout, recorded to the [build event log](./build-event-log.md) - -want invalidates all past wants of specified partitions (but not future; doesn't impact non-specified partitions) - Their primary purpose is to prevent an SLA breach alarm when a datasource is disabled, etc. - Need graph preconditions? And concept of external/virtual partitions or readiness probes? - Virtual partitions: allow graphs to say "precondition failed"; can be created in BEL, created via want or cron trigger? (e.g. want strategy continually tries to resolve the external data, creating a virtual partition once it can find it; cron just runs the script when its triggered) - Readiness probes don't fit the paradigm, feel too imperative.