Update designs
This commit is contained in:
parent
04c5924746
commit
033ba12f43
10 changed files with 546 additions and 2 deletions
18
DESIGN.md
18
DESIGN.md
|
|
@ -1,7 +1,7 @@
|
|||
|
||||
# DataBuild Design
|
||||
|
||||
DataBuild is a trivially-deployable, partition-oriented, declarative build system. Where data orchestration flows are normally imperative and implicit (do this, then do that, etc), DataBuild uses stated data dependencies to make this process declarative and explicit. DataBuild scales the declarative nature of tools like DBT to meet the needs of modern, broadly integrated data and ML organizations, that consume data from many sources and which arrive on a highly varying basis. DataBuild enables confident, bounded completeness in a world where input data is effectively never complete at any given time.
|
||||
DataBuild is a trivially-deployable, partition-oriented, declarative build system. Where data orchestration flows are normally imperative and implicit (do this, then do that, etc), DataBuild uses stated data dependencies to make this process declarative and explicit. DataBuild scales the declarative nature of tools like DBT to meet the needs of modern, broadly integrated data and ML organizations, who consume data from many sources and which arrive on a highly varying basis. DataBuild enables confident, bounded completeness in a world where input data is effectively never complete at any given time.
|
||||
|
||||
## Philosophy
|
||||
|
||||
|
|
@ -17,7 +17,8 @@ Graphs and jobs are defined in [bazel](https://bazel.build), allowing graphs (an
|
|||
- **Jobs** - Their `exec` entrypoint builds partitions from partitions, and their `config` entrypoint specifies what partitions are required to produce the requested partition(s), along with the specific config to run `exec` with to build said partitions.
|
||||
- **Graphs** - Composes jobs together to achieve multi-job orchestration, using a `lookup` mechanism to resolve a requested partition to the job that can build it. Together with its constituent jobs, Graphs can fully plan the build of any set of partitions. Most interactions with a DataBuild app happen with a graph.
|
||||
- **Build Event Log** - Encodes the state of the system, recording build requests, job activity, partition production, etc to enable running databuild as a deployed application.
|
||||
- **Bazel targets** - Bazel is a fast, extensible, and hermetic build system. DataBuild uses bazel targets to describe graphs and jobs, making graphs themselves deployable application. Implementing a DataBuild app is the process of integrating your data build jobs in `databuild_job` bazel targets, and connecting them with a `databuild_graph` target.
|
||||
- **Bazel Targets** - Bazel is a fast, extensible, and hermetic build system. DataBuild uses bazel targets to describe graphs and jobs, making graphs themselves deployable application. Implementing a DataBuild app is the process of integrating your data build jobs in `databuild_job` bazel targets, and connecting them with a `databuild_graph` target.
|
||||
- [**Graph Specification Strategies**](design/graph-specification.md) (coming soon) Application libraries in Python/Rust/Scala that use language features to enable ergonomic and succinct specification of jobs and graphs.
|
||||
|
||||
### Partition / Job Assumptions and Best Practices
|
||||
|
||||
|
|
@ -54,4 +55,17 @@ The BEL encodes all relevant build actions that occur, enabling concurrent build
|
|||
|
||||
The BEL is similar to [event-sourced](https://martinfowler.com/eaaDev/EventSourcing.html) systems, as all application state is rendered from aggregations over the BEL. This enables the BEL to stay simple while also powering concurrent builds, the data catalog, and the DataBuild service.
|
||||
|
||||
### Triggers and Wants (Coming Soon)
|
||||
["Wants"](./design/triggers.md) are the main mechanism for continually building partitions over time. In real world scenarios, it is standard for data to arrive late, or not at all. Wants cause the databuild graph to continually attempt to build the wanted partitions until a) the partitions are live or b) the want expires, at which another script can be run. Wants are the mechanism that implements SLA checking.
|
||||
|
||||
You can also use cron-based triggers, which return partition refs that they want built.
|
||||
|
||||
# Key Insights
|
||||
|
||||
- Orchestration logic changes all the time - better to not write it at all.
|
||||
- Orchestration decisions and application logic is innately coupled
|
||||
|
||||
## Assumptions
|
||||
|
||||
- Job -> partition relationships are canonical, job runs are idempotent
|
||||
|
||||
|
|
|
|||
33
design/build-event-log.md
Normal file
33
design/build-event-log.md
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
|
||||
# Build Event Log (BEL)
|
||||
Purpose: Store build events and define views summarizing databuild application state, like partition catalog, build
|
||||
status summary, job run statistics, etc.
|
||||
|
||||
## Architecture
|
||||
- Uses [event sourcing](https://martinfowler.com/eaaDev/EventSourcing.html) /
|
||||
[CQRS](https://www.wikipedia.org/wiki/cqrs) philosophy.
|
||||
- BEL uses only two types of tables:
|
||||
- The root event table, with event ID, timestamp, message, event type, and ID fields for related event types.
|
||||
- Type-specific event tables (e.g. task even, partition event, build request event, etc).
|
||||
- This makes it easy to support multiple backends (SQLite, postgres, and delta tables are supported initially).
|
||||
- Exposes an access layer that mediates writes, and which exposes entity-specific repositories for reads.
|
||||
|
||||
## Correctness Strategy
|
||||
- Access layer will evaluate events requested to be written, returning an error if the event is not a correct next.
|
||||
state based on the involved component's governing state diagram.
|
||||
- Events are versioned, with each versions' schemas stored in [`databuild.proto`](../databuild/databuild.proto).
|
||||
|
||||
## Write Interface
|
||||
See [trait definition](../databuild/event_log/mod.rs).
|
||||
|
||||
## Read Repositories
|
||||
There are repositories for the following entities:
|
||||
- Builds
|
||||
- Jobs
|
||||
- Partitions
|
||||
- Tasks
|
||||
|
||||
Generally the following verbs are available for each:
|
||||
- Show
|
||||
- List
|
||||
- Cancel
|
||||
137
design/core-build.md
Normal file
137
design/core-build.md
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
|
||||
# Core Build
|
||||
Purpose: Centralize the build logic and semantics in a performant, correct core.
|
||||
|
||||
## Architecture
|
||||
- Jobs depend on input partitions and produce output partitions.
|
||||
- Graphs compose jobs to fully plan and execute builds of requested partitions.
|
||||
- Both jobs and graphs emit events via the [build event log](./build-event-log.md) to update build state.
|
||||
- A common interface is implemented to execute job and graph build actions, which different clients rely on (e.g. CLI,
|
||||
service, etc)
|
||||
- Jobs and graphs use wrappers to implement configuration and [observability](./observability.md)
|
||||
- Graph-based composition is the basis for databuild application [deployment](./deploy-strategies.md)
|
||||
|
||||
## Jobs
|
||||
Jobs are the atomic unit of work in databuild.
|
||||
- Job wrapper fulfills configuration, observability, and record keeping
|
||||
|
||||
### `job.config`
|
||||
Purpose: Enable planning of execution graph. Executed in-process when possible for speed. For interface details, see
|
||||
[`PartitionRef`](./glossary.md#partitionref) and [`JobConfig`](./glossary.md#jobconfig) in
|
||||
[`databuild.proto`](../databuild/databuild.proto).
|
||||
|
||||
```rust
|
||||
trait DataBuildJob {
|
||||
fn config(outputs: Vec<PartitionRef>) -> JobConfig;
|
||||
}
|
||||
```
|
||||
|
||||
#### `job.config` State Diagrams
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
begin((begin)) --> validate_args
|
||||
emit_job_config_fail --> fail((fail))
|
||||
validate_args -- fail --> emit_arg_validate_fail --> emit_job_config_fail
|
||||
validate_args -- success --> emit_arg_validate_success --> run_config
|
||||
run_config -- fail --> emit_config_fail --> emit_job_config_fail
|
||||
run_config -- success --> emit_config_success ---> success((success))
|
||||
```
|
||||
|
||||
### `job.exec`
|
||||
Purpose: Execute job in exec wrapper.
|
||||
|
||||
```rust
|
||||
trait DataBuildJob {
|
||||
fn exec(config: JobConfig) -> PartitionManifest;
|
||||
}
|
||||
```
|
||||
|
||||
#### `job.exec` State Diagram
|
||||
```mermaid
|
||||
flowchart TD
|
||||
begin((begin)) --> validate_config
|
||||
emit_job_exec_fail --> fail((fail))
|
||||
validate_config -- fail --> emit_config_validate_fail --> emit_job_exec_fail
|
||||
validate_config -- success --> emit_config_validate_success --> launch_task
|
||||
launch_task -- fail --> emit_task_launch_fail --> emit_job_exec_fail
|
||||
launch_task -- success --> emit_task_launch_success --> await_task
|
||||
await_task -- waited N seconds --> emit_heartbeat --> await_task
|
||||
await_task -- non-zero exit code --> emit_task_failed --> emit_job_exec_fail
|
||||
await_task -- zero exit code --> emit_task_success --> calculate_metadata
|
||||
calculate_metadata -- fail --> emit_metadata_calculation_fail --> emit_job_exec_fail
|
||||
calculate_metadata -- success --> emit_metadata ---> success((success))
|
||||
```
|
||||
|
||||
## Graphs
|
||||
Graphs are the unit of composition. To `analyze` (plan) task graphs (see [`JobGraph`](./glossary.md#jobgraph)), they
|
||||
iteratively walk back from the requested output partitions, invoking `job.config` until no unresolved partitions
|
||||
remain. To `build` partitions, the graph runs `analyze` then iteratively executes the resulting task graph.
|
||||
|
||||
### `graph.analyze`
|
||||
Purpose: produce a complete task graph to materialize a requested set of partitions.
|
||||
|
||||
```rust
|
||||
trait DataBuildGraph {
|
||||
fn analyze(outputs: Vec<PartitionRef>) -> JobGraph;
|
||||
}
|
||||
```
|
||||
|
||||
#### `graph.analyze` State Diagram
|
||||
```mermaid
|
||||
flowchart TD
|
||||
begin((begin)) --> initialize_missing_partitions --> dispatch_missing_partitions
|
||||
emit_graph_analyze_fail --> fail((fail))
|
||||
dispatch_missing_partitions -- fail --> emit_partition_dispatch_fail --> emit_graph_analyze_fail
|
||||
dispatch_missing_partitions -- success --> cycle_detected?
|
||||
cycle_detected? -- yes --> emit_cycle_detected --> emit_graph_analyze_fail
|
||||
cycle_detected? -- no --> remaining_missing_partitions?
|
||||
remaining_missing_partitions? -- yes --> dispatch_missing_partitions
|
||||
remaining_missing_partitions? -- no --> emit_job_graph --> success((success))
|
||||
```
|
||||
|
||||
### `graph.build`
|
||||
Purpose: analyze, then execute the resulting task graph.
|
||||
|
||||
```rust
|
||||
trait DataBuildGraph {
|
||||
fn build(outputs: Vec<PartitionRef>);
|
||||
}
|
||||
```
|
||||
|
||||
#### `graph.build` State Diagram
|
||||
```mermaid
|
||||
flowchart TD
|
||||
begin((begin)) --> graph_analyze
|
||||
emit_graph_build_fail --> fail((fail))
|
||||
graph_analyze -- fail --> emit_graph_build_fail
|
||||
graph_analyze -- success --> initialize_ready_jobs --> remaining_ready_jobs?
|
||||
remaining_ready_jobs? -- yes --> emit_remaining_jobs --> schedule_jobs
|
||||
remaining_ready_jobs? -- none schedulable --> emit_jobs_unschedulable --> emit_graph_build_fail
|
||||
schedule_jobs -- fail --> emit_job_schedule_fail --> emit_graph_build_fail
|
||||
schedule_jobs -- success --> emit_job_schedule_success --> await_jobs
|
||||
await_jobs -- job_failure --> emit_job_failure --> emit_job_cancels --> cancel_running_jobs
|
||||
cancel_running_jobs --> emit_graph_build_fail
|
||||
await_jobs -- N seconds since heartbeat --> emit_heartbeat --> await_jobs
|
||||
await_jobs -- job_success --> remaining_ready_jobs?
|
||||
remaining_ready_jobs? -- no ---------> emit_graph_build_success --> success((success))
|
||||
```
|
||||
|
||||
## Correctness Strategy
|
||||
- Core component interfaces are described in [`databuild.proto`](../databuild/databuild.proto), a protobuf interface
|
||||
shared by all core components and all [GSLs](./graph-specification.md).
|
||||
- [GSLs](./graph-specification.md) implement ergonomic graph, job, and partition helpers that make coupling explicit
|
||||
- Graphs automatically detect and raise on non-unique job -> partition mappings
|
||||
- Graph and job processes are fully described by state diagrams, whose state transitions are logged to the
|
||||
[build event log](./build-event-log.md).
|
||||
|
||||
## Partition Delegation
|
||||
- Sometimes a partition already exists, or another build request is already planning on producing a partition
|
||||
- A later build request with delegate to an already existing build request for said partition
|
||||
- The later build request will write an event to the [build event log](./build-event-log.md) referencing the ID
|
||||
of the delegate, allowing traceability of visualization
|
||||
|
||||
## Heartbeats / Health Checks
|
||||
- Which strategy do we use?
|
||||
- If we are launching tasks to a place we can't health check, how could they heartbeat?
|
||||
|
||||
11
design/deploy-strategies.md
Normal file
11
design/deploy-strategies.md
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
|
||||
# Deploy Strategies
|
||||
|
||||
- Purpose
|
||||
- Trivial deployment and updates for databuild applications is key, allowing for shipping quickly via continuous delivery
|
||||
- Build continuity across deploys
|
||||
- Strategies
|
||||
- Binary deployment
|
||||
- Docker deployment
|
||||
- K8s deployment
|
||||
- Workloads on cloud run, k8s job submission, etc?
|
||||
34
design/glossary.md
Normal file
34
design/glossary.md
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
|
||||
# `Job`
|
||||
Atomic unit of work, producing and consuming specific partitions. See [jobs](./core-build.md#jobs).
|
||||
|
||||
# `Graph`
|
||||
Composes [jobs](#job) to build partitions. See [graphs](./core-build.md#graphs)
|
||||
|
||||
# `Partition`
|
||||
Partitions are atomic units of data, produced and depended on by jobs. A job can produce multiple partitions, but
|
||||
multiple jobs cannot produce the same partition - e.g. job -> partition relationships must be unique/canonical.
|
||||
|
||||
# `PartitionRef`
|
||||
PartitionsRefs are strings that uniquely identify partitions. They can contain anything, but generally they are S3
|
||||
URIs, like `s3://companybkt/datasets/foo/date=2025-01-01`, or custom formats like
|
||||
`dal://prod/clicks/region=4/date=2025-01-01/`. PartitionRefs are used as dependency signals during
|
||||
[task graph analysis](./core-build.md#graphanalyze). To enable explicit coupling and ergonomics, there are generally
|
||||
helper classes for creating, parsing, and accessing fields for PartitionRefs in [GSLs](#graph-specification-language-gsl).
|
||||
|
||||
# `PartitionPattern`
|
||||
Patterns that group partitions (e.g. a dataset) and allow for validation (e.g. does this job actually produce the
|
||||
expected output partition?)
|
||||
|
||||
# `JobConfig`
|
||||
The complete configuration of a job needed to produce the desired partitions, as calculated by
|
||||
[`job.config`](./core-build.md#jobconfig)
|
||||
|
||||
# `JobGraph`
|
||||
A complete graph of job configs, with [`PartitionRef`](#partitionref) dependency edges, which when executed will
|
||||
produce the requested partitions.
|
||||
|
||||
# Graph Specification Language (GSL)
|
||||
Language-specific libraries that make implementing databuild graphs and jobs more succinct and ergonomic.
|
||||
See [graph specification](./graph-specification.md).
|
||||
|
||||
116
design/graph-specification.md
Normal file
116
design/graph-specification.md
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
|
||||
# App Specification
|
||||
|
||||
AKA the different ways databuild applications can be described.
|
||||
|
||||
## Correctness Strategy
|
||||
- Examples implemented that use each graph specification strategy, and are tested in CI/CD.
|
||||
- Graph specification strategies provide
|
||||
|
||||
## Bazel
|
||||
|
||||
- Purpose: compilation/build target that fulfills promise of project (like bytecode for JVM langs)
|
||||
- Job binaries (config and exec)
|
||||
- Graph lookup binary (lookup)
|
||||
- Job target (config and exec)
|
||||
- Graph target (build and analyze)
|
||||
- See [core build](./core-build.md) for details
|
||||
|
||||
## Python
|
||||
|
||||
- Wrapper functions enable graph registry
|
||||
- Partition object increases ergonomics and enables explicit data coupling
|
||||
|
||||
```python
|
||||
|
||||
from dataclasses import dataclass
|
||||
from databuild import (
|
||||
DataBuildGraph, DataBuildJob, Partition, JobConfig, PyJobConfig, BazelJobConfig, PartitionManifest, Want
|
||||
)
|
||||
from helpers import ingest_reviews, categorize_reviews, sla_failure_notify
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
graph = DataBuildGraph("//:podcast_reviews_graph")
|
||||
|
||||
ALL_CATEGORIES = {"comedy", ...}
|
||||
|
||||
# Partition definitions, used by the graph to resolve jobs by introspecting their config signatures
|
||||
ExtractedReviews = Partition[r"reviews/date=(?P<date>\d{4}-\d{2}-\d{2})"]
|
||||
CategorizedReviews = Partition[r"categorized_reviews/category=(?P<category>[^/]+)/date=(?P<date>\d{4}-\d{2}-\d{2})"]
|
||||
PhraseModel = Partition[r"phrase_models/category=(?P<category>[^/]+)/date=(?P<date>\d{4}-\d{2}-\d{2})"]
|
||||
PhraseStats = Partition[r"phrase_stats/category=(?P<category>[^/]+)/date=(?P<date>\d{4}-\d{2}-\d{2})"]
|
||||
|
||||
|
||||
@graph.job
|
||||
class ExtractReviews(DataBuildJob):
|
||||
def config(self, outputs: list[ExtractedReviews]) -> list[JobConfig]:
|
||||
# One job run can output multiple partitions
|
||||
args = [p.date for p in outputs]
|
||||
return [JobConfig(outputs=outputs, inputs=[], args=args,)]
|
||||
|
||||
def exec(self, config: JobConfig) -> PartitionManifest:
|
||||
for (date, output) in zip(config.args, config.outputs):
|
||||
ingest_reviews(date).write(output)
|
||||
# Start and end time inferred by wrapper (but could be overridden)
|
||||
return config.partitionManifest(job=self)
|
||||
|
||||
|
||||
@dataclass
|
||||
class CategorizeReviewsArgs:
|
||||
date: str
|
||||
category: str
|
||||
|
||||
|
||||
@graph.job
|
||||
class CategorizeReviews(DataBuildJob):
|
||||
def config(self, outputs: list[CategorizedReviews]) -> list[JobConfig]:
|
||||
# This job only outputs one partition per run
|
||||
return [
|
||||
# The PyJobConfig allows you to pass objects in config, rather than just `args` and `env`
|
||||
PyJobConfig[CategorizeReviewsArgs](
|
||||
outputs=[p],
|
||||
inputs=ExtractedReviews.dep.materialize(date=p.date),
|
||||
params=CategorizeReviewsArgs(date=p.date, category=p.category),
|
||||
)
|
||||
for p in outputs
|
||||
]
|
||||
|
||||
def exec(self, config: PyJobConfig[CategorizeReviewsArgs]) -> None:
|
||||
categorize_reviews(config.params.date, config.params.category)
|
||||
# Partition manifest automatically constructed from config
|
||||
|
||||
|
||||
@graph.job
|
||||
class PhraseModeling(DataBuildJob):
|
||||
def config(self, outputs: list[PhraseModel]) -> list[JobConfig]:
|
||||
# This job relies on a bazel executable target to run the actual job
|
||||
return [
|
||||
BazelJobConfig(
|
||||
outputs=[p],
|
||||
inputs=[CategorizedReviews.dep.materialize(date=p.date, category=p.category)],
|
||||
exec_target="//jobs:phrase_modeling",
|
||||
env={"CATEGORY": p.category, "DATA_DATE": p.date},
|
||||
)
|
||||
for p in outputs
|
||||
]
|
||||
|
||||
|
||||
# This job is fully defined in bazel
|
||||
graph.bazel_job(target="//jobs:phrase_stats_job", outputs=list[PhraseStats])
|
||||
|
||||
|
||||
@graph.want(cron='0 0 * * *')
|
||||
def phrase_stats_want() -> list[Want[PhraseStats]]:
|
||||
# Crates a new want every midnight that times out in 3 days
|
||||
wanted = [PhraseStats(date=datetime.now().date().isoformat(), category=cat) for cat in ALL_CATEGORIES]
|
||||
on_fail = lambda p: f"Failed to calculate partition `{p}`"
|
||||
return [graph.want(partitions=wanted, ttl=timedelta(days=3), on_fail=on_fail)]
|
||||
|
||||
```
|
||||
|
||||
- TODO - do we need an escape hatch for "after 2025 use this job, before use that job" functionality?
|
||||
|
||||
## Rust?
|
||||
|
||||
## Scala?
|
||||
|
||||
19
design/observability.md
Normal file
19
design/observability.md
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
|
||||
# Observability
|
||||
|
||||
- Purpose
|
||||
- To enable simple, comprehensive metrics and logging observability for databuild applications
|
||||
- Wrappers as observability implementation
|
||||
- Liveness guarantees are:
|
||||
- Task process is still running
|
||||
- Logs are being shipped
|
||||
- Metrics are being gathered (graph scrapes worker metrics, re-exposes)
|
||||
- Heartbeating
|
||||
- Log shipping
|
||||
- Metrics exposed
|
||||
- Metrics
|
||||
- Service
|
||||
- Jobs
|
||||
- Logging
|
||||
- Service
|
||||
- Jobs
|
||||
123
design/service.md
Normal file
123
design/service.md
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
|
||||
# Service
|
||||
Purpose: Enable centrally hostable and human-consumable interface for databuild applications.
|
||||
|
||||
## Correctness Strategy
|
||||
- Rely on databuild.proto, call same shared code in core
|
||||
- Fully asserted type safety from core to service to web app
|
||||
- Core -- databuild.proto --> service -- openapi --> web app
|
||||
- No magic strings (how? protobuf doesn't have consts. enums values? code gen over yaml?)
|
||||
|
||||
## API
|
||||
The purpose of the API is to enable remote, programmatic interaction with databuild applications, and to host endpoints
|
||||
needed by the [web app](#web-app).
|
||||
|
||||
See [OpenAPI spec](../bazel-bin/databuild/client/openapi.json) (may need to
|
||||
`bazel build //databuild/client:extract_openapi_spec` if its not found).
|
||||
|
||||
## Web App
|
||||
The web app visualizes databuild application state via features like listing past builds, job statistics,
|
||||
partition liveness, build request status, etc. This section specifies the hierarchy of functions of the web app. Pages
|
||||
are described in visual order (generally top to bottom).
|
||||
|
||||
General requirements:
|
||||
- Nav at top of page
|
||||
- DataBuild logo in top left
|
||||
- Navigation links at the top allowing navigation to each list page:
|
||||
- Wants list page
|
||||
- Jobs list page
|
||||
- Build requests list page
|
||||
- Triggers list page
|
||||
- Build event log page
|
||||
- Graph label at top right
|
||||
- Search box for finding builds, jobs, and partitions (needs a new service API?)
|
||||
|
||||
### Home Page
|
||||
Jumping off point to navigate and build.
|
||||
- A text box, an "Analyze" button, and a "Build" button for doing exactly that (would be great to have autocomplete,
|
||||
also PartitionRef patterns would help with ergonomics for less typing / more safety)
|
||||
- List recent builds with their requested partitions and current status, with link to build request page
|
||||
- List of recently attempted partitions, with status, link to partition page, and link to build request page
|
||||
- List of jobs, with (colored) last week success ratio, and link to job page
|
||||
|
||||
### Build Request Page
|
||||
- Show build request ID and overall status of build (colored) and "Cancel" button at top
|
||||
- progress bar indicating number of: needs-build partitions, building partitions, non-live delegated partitions, and
|
||||
live partitions
|
||||
- Summary information table
|
||||
- Requested at
|
||||
- analyze time (with datetime range)
|
||||
- build time (with datetime range)
|
||||
- number of tasks in each state (don't include sates with 0 count)
|
||||
- number of partitions in each state (don't include sates with 0 count)
|
||||
- Show graph diagram of job graph (collapsable)
|
||||
- With each job and partition status color coded & linked to related run / partition
|
||||
- [paginated](#build-event-log-pagination) list of related build events at bottom
|
||||
|
||||
### Job Status Page
|
||||
- Job label
|
||||
- "Recent Runs" select, controlling page size
|
||||
- "Recent Runs Page" select - the `< 1 2 3 ... N >` style paginator
|
||||
- Job success rate (for all selected; colored)
|
||||
- Bar graph showing job execution run times for last N (selectable between 31, 100, 365)
|
||||
- Recent task runs
|
||||
- With links to build request, task run, partition
|
||||
- With task result
|
||||
- With run time
|
||||
- With expandable partition metadata
|
||||
- [paginated](#build-event-log-pagination) list of related build events at bottom
|
||||
|
||||
### Task Run Page
|
||||
- With job label, task status, and "Cancel" button at top
|
||||
- Summary information table
|
||||
- task run ID
|
||||
- output/input partitions
|
||||
- task start and end time
|
||||
- task duration
|
||||
- Graph similar to [build request page](#build-request-page), all partitions and jobs not involved in this task made
|
||||
translucent (expandable)
|
||||
- With [paginated](#build-event-log-pagination) table of build events at bottom
|
||||
|
||||
### Partition Status Page
|
||||
- With PartitionRef, link to matching [PartitionPattern](#partitionpattern-page), color-coded status, and "build" button at top
|
||||
- List of tasks that produced this partition
|
||||
- [paginated](#build-event-log-pagination) list of related build events at bottom
|
||||
|
||||
### PartitionPattern Page
|
||||
- Paginated table of partitions that match this partition pattern, sortable by cols, including:
|
||||
- Partition ref (with link)
|
||||
- Partition pattern values
|
||||
- Partition status
|
||||
- Build request link
|
||||
- Task link (with run time next to it)
|
||||
|
||||
## Triggers List Page
|
||||
- Paginated list of registered triggers
|
||||
- With link to trigger detail page
|
||||
- With expandable list of produced build requests or wants
|
||||
|
||||
## Trigger Detail Page
|
||||
- Trigger name, last run at, and "Trigger" button at top
|
||||
- Trigger history table, including:
|
||||
- Trigger time
|
||||
- Trigger result (successful/failed)
|
||||
- Partitions or wants requested
|
||||
|
||||
## Wants List Page
|
||||
|
||||
## Want Detail Page
|
||||
|
||||
|
||||
### Build Event Log Page
|
||||
I dunno, some people want to look at the raw thing.
|
||||
- A [paginated](#build-event-log-pagination) list of build event log entries
|
||||
|
||||
### Build Event Log Pagination
|
||||
This element is present on most pages, and should be reusable/pluggable for a given set of events/filters.
|
||||
- Table with headers of significant fields, sorted by timestamp by default
|
||||
- With timestamp, event ID, and message field
|
||||
- With color coded event type
|
||||
- With links to build requests, jobs, and partitions where IDs are present
|
||||
- With expandable details that show the preformatted JSON event contents
|
||||
- With the `< 1 2 3 ... N >` style paginator
|
||||
- Page size of 100
|
||||
56
design/triggers.md
Normal file
56
design/triggers.md
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
|
||||
# Triggers
|
||||
Purpose: to enable simple but powerful declarative specification of what data should be built.
|
||||
|
||||
## Correctness Strategy
|
||||
- Wants + TTLs
|
||||
- ...?
|
||||
|
||||
## Wants
|
||||
Wants cause graphs to try to build the wanted partitions until a) the partitions are live or b) the TTL runs out. Wants
|
||||
can trigger a callback on TTL expiry, enabling SLA-like behavior. Wants are recorded in the [BEL](./build-event-log.md),
|
||||
so they can be queried and viewed in the web app, linking to build requests triggered by a given want, enabling
|
||||
answering of the "why doesn't this partition exist yet?" question.
|
||||
|
||||
### Unwants
|
||||
You can also unwant partitions, which overrides all wants of those partitions prior to the unwant timestamp. This is
|
||||
primarily to enable the "data source is now disabled" style feature practically necessary in many data platforms.
|
||||
|
||||
### Virtual Partitions & External Data
|
||||
Essentially all data teams consume some external data source, and late arriving data is the rule more than the
|
||||
exception. Virtual partitions are a way to model external data that is not produced by a graph. For all intents and
|
||||
purposes, these are standard partitions, the only difference is that the job that "produces" them doesn't actually
|
||||
do any ETL, it just assesses external data sufficiency and emits a "partition live" event when its ready to be consumed.
|
||||
|
||||
## Triggers
|
||||
|
||||
## Taints
|
||||
- Mechanism for invalidating existing partitions (e.g. we know bad data went into this, need to stop consumers from
|
||||
using it)
|
||||
|
||||
---
|
||||
|
||||
- Purpose
|
||||
- Every useful data application has triggering to ensure data is built on schedule
|
||||
- Philosophy
|
||||
- Opinionated strategy plus escape hatches
|
||||
- Taints
|
||||
|
||||
- Two strategies
|
||||
- Basic: cron triggered scripts that return partitions
|
||||
- Bazel: target with `cron`, `executable` fields, optional `partition_patterns` field to constrain
|
||||
- Declarative: want-based, wants cause build requests to be continually retried until the wanted
|
||||
partitions are live, or running a `want_failed` script if it times out (e.g. SLA breach)
|
||||
- +want and -want
|
||||
- +want declares want for 1+ partitions with a timeout, recorded to the [build event log](./build-event-log.md)
|
||||
- -want invalidates all past wants of specified partitions (but not future; doesn't impact non-specified
|
||||
partitions)
|
||||
- Their primary purpose is to prevent an SLA breach alarm when a datasource is disabled, etc.
|
||||
- Need graph preconditions? And concept of external/virtual partitions or readiness probes?
|
||||
- Virtual partitions: allow graphs to say "precondition failed"; can be created in BEL, created via want or
|
||||
cron trigger? (e.g. want strategy continually tries to resolve the external data, creating a virtual
|
||||
partition once it can find it; cron just runs the script when its triggered)
|
||||
- Readiness probes don't fit the paradigm, feel too imperative.
|
||||
|
||||
|
||||
|
||||
|
|
@ -4,6 +4,7 @@
|
|||
- Status indicator for page selection
|
||||
- On build request detail page, show aggregated job results
|
||||
- Use path based navigation instead of hashbang?
|
||||
- Add build request notes
|
||||
- How do we encode job labels in the path? (Build event job links are not encoding job labels properly)
|
||||
- Resolve double type system with protobuf and openapi
|
||||
- Prometheus metrics export
|
||||
|
|
|
|||
Loading…
Reference in a new issue