databuild/databuild/test/app
2025-08-16 15:53:26 -07:00
..
bazel Add test app e2e test coverage for generated graph 2025-08-16 15:53:26 -07:00
dsl Add test app e2e test coverage for generated graph 2025-08-16 15:53:26 -07:00
jobs Implement test app in python DSL 2025-07-31 22:42:07 -07:00
BUILD.bazel Add test app e2e test coverage for generated graph 2025-08-16 15:53:26 -07:00
colors.py Describe other jobs 2025-07-30 21:53:25 -07:00
dal.py Fix service binary and static asset serving 2025-08-04 11:18:28 -07:00
e2e_test_common.py Add test app e2e test coverage for generated graph 2025-08-16 15:53:26 -07:00
README.md Describe other jobs 2025-07-30 21:53:25 -07:00

Test DataBuild App

This directory contains common job components for testing databuild apps described via different methods, e.g. the core bazel targets, the python DSL, etc.

Structure

The fictitious use case is "daily color votes". The underlying input data is votes per color per day, which we combine and aggregate in ways that help us test different aspects of databuild. Job exec contents should be trivial, as the purpose is to test composition. Types of partition relationships:

  • Time-range: 1 day depending on N prior days
  • Multi-partition-output jobs
    • Always output multiple, e.g. producing per type
    • Consume different inputs based on desired output
    • Produce multiple of the same type depending on input
flowchart TD
    daily_color_votes[(daily_color_votes/$date/$color)]
    color_votes_1w[(color_votes_1w/$date/$color)]
    color_votes_1m[(color_votes_1m/$date/$color)]
    daily_votes[(daily_votes/$date)]
    votes_1w[(votes_1w/$date)]
    votes_1m[(votes_1m/$date)]
    color_vote_report[(color_vote_report/$date/$color)]
    ingest_color_votes --> daily_color_votes
    daily_color_votes --> trailing_color_votes --> color_votes_1w & color_votes_1m
    daily_color_votes --> aggregate_color_votes --> daily_votes
    color_votes_1w --> aggregate_color_votes --> votes_1w
    color_votes_1m --> aggregate_color_votes --> votes_1m
    daily_votes & votes_1w & votes_1m & color_votes_1w & color_votes_1m --> color_vote_report_calc --> color_vote_report

Data Access

Data access is implemented in dal.py, with data written as lists of dicts in JSON. Partition fields are stored as values in those dicts.