databuild/examples/podcast_reviews
2025-07-03 22:36:12 -07:00
..
.bazelrc colors everyhere 2025-05-03 20:52:39 -07:00
BUILD.bazel Allow docker packaging of podcast review example 2025-07-02 21:49:15 -07:00
categorize_reviews_job.py Update podcast reviews example 2025-07-02 21:37:55 -07:00
daily_summary_job.py Update podcast reviews example 2025-07-02 21:37:55 -07:00
duckdb_utils.py cmt 2025-06-30 22:15:48 -07:00
extract_podcasts_job.py fix podcast job 2025-06-30 23:27:24 -07:00
extract_reviews_job.py Update podcast reviews example 2025-07-02 21:37:55 -07:00
job_lookup.py cmt 2025-06-30 22:15:48 -07:00
MODULE.bazel Allow docker packaging of podcast review example 2025-07-02 21:49:15 -07:00
MODULE.bazel.lock Implement prost struct generation 2025-07-03 22:36:12 -07:00
phrase_modeling_job.py Update podcast reviews example 2025-07-02 21:37:55 -07:00
phrase_stats_job.py Update podcast reviews example 2025-07-02 21:37:55 -07:00
py_repl.bzl add py_repl to podcast_reviews example 2025-04-18 23:57:08 -07:00
README.md cmt 2025-06-30 22:15:48 -07:00
requirements.in cmt 2025-06-30 22:15:48 -07:00
requirements_lock.txt cmt 2025-06-30 22:15:48 -07:00
test_jobs.py cmt 2025-06-30 22:15:48 -07:00
unified_job.py Match proto definitions 2025-06-29 20:47:07 -07:00

Podcast Reviews Example

This is an example data application where we produce text insights from podcast review data. It is made up of N datasets:

  • Raw reviews (date, podcast, text, rating)
  • Podcasts (podcast, title, category)
  • Categorized review text (date, category, podcast, text)
  • Phrase models (date, category, hash, ngram, score)
  • Podcast phrase stats (date, category, podcast, ngram, count, rating)
  • Podcast daily summary (date, category, podcast, phrase_stats, recent_reviews)
flowchart LR
    raw_reviews[(Raw Reviews)] & podcasts[(Podcasts)] --> categorize_text --> categorized_texts[(Categorized Texts)]
    categorized_texts --> phrase[Phrase Modeling] --> phrase_models[(Phrase Models)]
    phrase_models & raw_reviews --> phrase_stats --> podcast_phrase_stats[(Podcast Phrase Stats)]
    podcast_phrase_stats & raw_reviews --> calc_summary --> podcast_daily_summary[(Podcast Daily Summary)]

Input Data

Get it from here! (and put it in examples/podcast_reviews/data/ingest/database.sqlite)

phrase Dependency

This relies on soaxelbrooke/phrase for phrase extraction - check out its releases to get a relevant binary.