History

Stuart Axelbrooke 196622fe17 Some checks are pending / setup (push) Waiting to run Details Match proto definitions		2025-06-29 20:47:07 -07:00
..
.bazelrc	colors everyhere	2025-05-03 20:52:39 -07:00
BUILD.bazel	Clean up old paths	2025-06-29 20:14:33 -07:00
MODULE.bazel	Add scripts and fix tests	2025-05-03 20:31:46 -07:00
MODULE.bazel.lock	commit	2025-06-29 19:28:46 -07:00
py_repl.bzl	add py_repl to podcast_reviews example	2025-04-18 23:57:08 -07:00
README.md	commit	2025-06-29 19:28:46 -07:00
requirements.in	add py_repl to podcast_reviews example	2025-04-18 23:57:08 -07:00
requirements_lock.txt	add py_repl to podcast_reviews example	2025-04-18 23:57:08 -07:00
unified_job.py	Match proto definitions	2025-06-29 20:47:07 -07:00

README.md

Podcast Reviews Example

This is an example data application where we produce text insights from podcast review data. It is made up of N datasets:

Raw reviews (date, podcast, text, rating)
Podcasts (podcast, title, category)
Categorized review text (date, category, podcast, text)
Phrase models (date, category, hash, ngram, score)
Podcast phrase stats (date, category, podcast, ngram, count, rating)
Podcast daily summary (date, category, podcast, phrase_stats, recent_reviews)

flowchart LR
    raw_reviews[(Raw Reviews)] & podcasts[(Podcasts)] --> categorize_text --> categorized_texts[(Categorized Texts)]
    categorized_texts --> phrase[Phrase Modeling] --> phrase_models[(Phrase Models)]
    phrase_models & raw_reviews --> phrase_stats --> podcast_phrase_stats[(Podcast Phrase Stats)]
    podcast_phrase_stats & raw_reviews --> calc_summary --> podcast_daily_summary[(Podcast Daily Summary)]

Input Data

Get it from here! (and put it in examples/podcast_reviews/data/ingest/database.sqlite)