1.6 KiB
1.6 KiB
Test DataBuild App
This directory contains common job components for testing databuild apps described via different methods, e.g. the core bazel targets, the python DSL, etc.
Structure
The fictitious use case is "daily color votes". The underlying input data is votes per color per day, which we combine and aggregate in ways that help us test different aspects of databuild. Job exec contents should be trivial, as the purpose is to test composition. Types of partition relationships:
- Time-range: 1 day depending on N prior days
- Multi-partition-output jobs
- Always output multiple, e.g. producing per type
- Consume different inputs based on desired output
- Produce multiple of the same type depending on input
flowchart TD
daily_color_votes[(daily_color_votes/$date/$color)]
color_votes_1w[(color_votes_1w/$date/$color)]
color_votes_1m[(color_votes_1m/$date/$color)]
daily_votes[(daily_votes/$date)]
votes_1w[(votes_1w/$date)]
votes_1m[(votes_1m/$date)]
color_vote_report[(color_vote_report/$date/$color)]
ingest_color_votes --> daily_color_votes
daily_color_votes --> trailing_color_votes --> color_votes_1w & color_votes_1m
daily_color_votes --> aggregate_color_votes --> daily_votes
color_votes_1w --> aggregate_color_votes --> votes_1w
color_votes_1m --> aggregate_color_votes --> votes_1m
daily_votes & votes_1w & votes_1m & color_votes_1w & color_votes_1m --> color_vote_report_calc --> color_vote_report
Data Access
Data access is implemented in dal.py, with data written as lists of dicts in JSON. Partition fields are stored as values in those dicts.