56 lines
2.8 KiB
Markdown
56 lines
2.8 KiB
Markdown
|
|
# Triggers
|
|
Purpose: to enable simple but powerful declarative specification of what data should be built.
|
|
|
|
## Correctness Strategy
|
|
- Wants + TTLs
|
|
- ...?
|
|
|
|
## Wants
|
|
Wants cause graphs to try to build the wanted partitions until a) the partitions are live or b) the TTL runs out. Wants
|
|
can trigger a callback on TTL expiry, enabling SLA-like behavior. Wants are recorded in the [BEL](./build-event-log.md),
|
|
so they can be queried and viewed in the web app, linking to build requests triggered by a given want, enabling
|
|
answering of the "why doesn't this partition exist yet?" question.
|
|
|
|
### Unwants
|
|
You can also unwant partitions, which overrides all wants of those partitions prior to the unwant timestamp. This is
|
|
primarily to enable the "data source is now disabled" style feature practically necessary in many data platforms.
|
|
|
|
### Virtual Partitions & External Data
|
|
Essentially all data teams consume some external data source, and late arriving data is the rule more than the
|
|
exception. Virtual partitions are a way to model external data that is not produced by a graph. For all intents and
|
|
purposes, these are standard partitions, the only difference is that the job that "produces" them doesn't actually
|
|
do any ETL, it just assesses external data sufficiency and emits a "partition live" event when its ready to be consumed.
|
|
|
|
## Triggers
|
|
|
|
## Taints
|
|
- Mechanism for invalidating existing partitions (e.g. we know bad data went into this, need to stop consumers from
|
|
using it)
|
|
|
|
---
|
|
|
|
- Purpose
|
|
- Every useful data application has triggering to ensure data is built on schedule
|
|
- Philosophy
|
|
- Opinionated strategy plus escape hatches
|
|
- Taints
|
|
|
|
- Two strategies
|
|
- Basic: cron triggered scripts that return partitions
|
|
- Bazel: target with `cron`, `executable` fields, optional `partition_patterns` field to constrain
|
|
- Declarative: want-based, wants cause build requests to be continually retried until the wanted
|
|
partitions are live, or running a `want_failed` script if it times out (e.g. SLA breach)
|
|
- +want and -want
|
|
- +want declares want for 1+ partitions with a timeout, recorded to the [build event log](./build-event-log.md)
|
|
- -want invalidates all past wants of specified partitions (but not future; doesn't impact non-specified
|
|
partitions)
|
|
- Their primary purpose is to prevent an SLA breach alarm when a datasource is disabled, etc.
|
|
- Need graph preconditions? And concept of external/virtual partitions or readiness probes?
|
|
- Virtual partitions: allow graphs to say "precondition failed"; can be created in BEL, created via want or
|
|
cron trigger? (e.g. want strategy continually tries to resolve the external data, creating a virtual
|
|
partition once it can find it; cron just runs the script when its triggered)
|
|
- Readiness probes don't fit the paradigm, feel too imperative.
|
|
|
|
|
|
|