| .. | ||
| .bazelrc | ||
| BUILD.bazel | ||
| categorize_reviews_job.py | ||
| daily_summary_job.py | ||
| duckdb_utils.py | ||
| extract_podcasts_job.py | ||
| extract_reviews_job.py | ||
| job_lookup.py | ||
| MODULE.bazel | ||
| MODULE.bazel.lock | ||
| phrase_modeling_job.py | ||
| phrase_stats_job.py | ||
| py_repl.bzl | ||
| README.md | ||
| requirements.in | ||
| requirements_lock.txt | ||
| test_jobs.py | ||
| unified_job.py | ||
Podcast Reviews Example
This is an example data application where we produce text insights from podcast review data. It is made up of N datasets:
- Raw reviews
(date, podcast, text, rating) - Podcasts
(podcast, title, category) - Categorized review text
(date, category, podcast, text) - Phrase models
(date, category, hash, ngram, score) - Podcast phrase stats
(date, category, podcast, ngram, count, rating) - Podcast daily summary
(date, category, podcast, phrase_stats, recent_reviews)
flowchart LR
raw_reviews[(Raw Reviews)] & podcasts[(Podcasts)] --> categorize_text --> categorized_texts[(Categorized Texts)]
categorized_texts --> phrase[Phrase Modeling] --> phrase_models[(Phrase Models)]
phrase_models & raw_reviews --> phrase_stats --> podcast_phrase_stats[(Podcast Phrase Stats)]
podcast_phrase_stats & raw_reviews --> calc_summary --> podcast_daily_summary[(Podcast Daily Summary)]
Input Data
Get it from here! (and put it in examples/podcast_reviews/data/ingest/database.sqlite)
phrase Dependency
This relies on soaxelbrooke/phrase for phrase extraction - check out its releases to get a relevant binary.