databuild/databuild/test/app/README.md
Stuart Axelbrooke 30f1d9addb
Some checks failed
/ setup (push) Has been cancelled
Describe other jobs
2025-07-30 21:53:25 -07:00

1.6 KiB

Test DataBuild App

This directory contains common job components for testing databuild apps described via different methods, e.g. the core bazel targets, the python DSL, etc.

Structure

The fictitious use case is "daily color votes". The underlying input data is votes per color per day, which we combine and aggregate in ways that help us test different aspects of databuild. Job exec contents should be trivial, as the purpose is to test composition. Types of partition relationships:

  • Time-range: 1 day depending on N prior days
  • Multi-partition-output jobs
    • Always output multiple, e.g. producing per type
    • Consume different inputs based on desired output
    • Produce multiple of the same type depending on input
flowchart TD
    daily_color_votes[(daily_color_votes/$date/$color)]
    color_votes_1w[(color_votes_1w/$date/$color)]
    color_votes_1m[(color_votes_1m/$date/$color)]
    daily_votes[(daily_votes/$date)]
    votes_1w[(votes_1w/$date)]
    votes_1m[(votes_1m/$date)]
    color_vote_report[(color_vote_report/$date/$color)]
    ingest_color_votes --> daily_color_votes
    daily_color_votes --> trailing_color_votes --> color_votes_1w & color_votes_1m
    daily_color_votes --> aggregate_color_votes --> daily_votes
    color_votes_1w --> aggregate_color_votes --> votes_1w
    color_votes_1m --> aggregate_color_votes --> votes_1m
    daily_votes & votes_1w & votes_1m & color_votes_1w & color_votes_1m --> color_vote_report_calc --> color_vote_report

Data Access

Data access is implemented in dal.py, with data written as lists of dicts in JSON. Partition fields are stored as values in those dicts.