34 lines
1.6 KiB
Markdown
34 lines
1.6 KiB
Markdown
|
|
# Test DataBuild App
|
|
|
|
This directory contains common job components for testing databuild apps described via different methods, e.g. the core bazel targets, the python DSL, etc.
|
|
|
|
## Structure
|
|
|
|
The fictitious use case is "daily color votes". The underlying input data is votes per color per day, which we combine and aggregate in ways that help us test different aspects of databuild. Job exec contents should be trivial, as the purpose is to test composition. Types of partition relationships:
|
|
|
|
- Time-range: 1 day depending on N prior days
|
|
- Multi-partition-output jobs
|
|
- Always output multiple, e.g. producing per type
|
|
- Consume different inputs based on desired output
|
|
- Produce multiple of the same type depending on input
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
daily_color_votes[(daily_color_votes/$date/$color)]
|
|
color_votes_1w[(color_votes_1w/$date/$color)]
|
|
color_votes_1m[(color_votes_1m/$date/$color)]
|
|
daily_votes[(daily_votes/$date)]
|
|
votes_1w[(votes_1w/$date)]
|
|
votes_1m[(votes_1m/$date)]
|
|
color_vote_report[(color_vote_report/$date/$color)]
|
|
ingest_color_votes --> daily_color_votes
|
|
daily_color_votes --> trailing_color_votes --> color_votes_1w & color_votes_1m
|
|
daily_color_votes --> aggregate_color_votes --> daily_votes
|
|
color_votes_1w --> aggregate_color_votes --> votes_1w
|
|
color_votes_1m --> aggregate_color_votes --> votes_1m
|
|
daily_votes & votes_1w & votes_1m & color_votes_1w & color_votes_1m --> color_vote_report_calc --> color_vote_report
|
|
```
|
|
|
|
## Data Access
|
|
Data access is implemented in [`dal.py`](./dal.py), with data written as lists of dicts in JSON. Partition fields are stored as values in those dicts.
|