|
|
||
|---|---|---|
| .. | ||
| .bazelrc | ||
| BUILD.bazel | ||
| MODULE.bazel | ||
| MODULE.bazel.lock | ||
| py_repl.bzl | ||
| README.md | ||
| requirements.in | ||
| requirements_lock.txt | ||
| unified_job.py | ||
Podcast Reviews Example
This is an example data application where we produce text insights from podcast review data. It is made up of N datasets:
- Raw reviews
(date, podcast, text, rating) - Podcasts
(podcast, title, category) - Categorized review text
(date, category, podcast, text) - Phrase models
(date, category, hash, ngram, score) - Podcast phrase stats
(date, category, podcast, ngram, count, rating) - Podcast daily summary
(date, category, podcast, phrase_stats, recent_reviews)
flowchart LR
raw_reviews[(Raw Reviews)] & podcasts[(Podcasts)] --> categorize_text --> categorized_texts[(Categorized Texts)]
categorized_texts --> phrase[Phrase Modeling] --> phrase_models[(Phrase Models)]
phrase_models & raw_reviews --> phrase_stats --> podcast_phrase_stats[(Podcast Phrase Stats)]
podcast_phrase_stats & raw_reviews --> calc_summary --> podcast_daily_summary[(Podcast Daily Summary)]
Input Data
Get it from here! (and put it in examples/podcast_reviews/data/ingest/database.sqlite)