23 lines
1.1 KiB
Markdown
23 lines
1.1 KiB
Markdown
|
|
# Podcast Reviews Example
|
|
|
|
This is an example data application where we produce text insights from podcast review data. It is made up of N datasets:
|
|
|
|
- Raw reviews `(date, podcast, text, rating)`
|
|
- Podcasts `(podcast, title, category)`
|
|
- Categorized review text `(date, category, podcast, text)`
|
|
- Phrase models `(date, category, hash, ngram, score)`
|
|
- Podcast phrase stats `(date, category, podcast, ngram, count, rating)`
|
|
- Podcast daily summary `(date, category, podcast, phrase_stats, recent_reviews)`
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
raw_reviews[(Raw Reviews)] & podcasts[(Podcasts)] --> categorize_text --> categorized_texts[(Categorized Texts)]
|
|
categorized_texts --> phrase[Phrase Modeling] --> phrase_models[(Phrase Models)]
|
|
phrase_models & raw_reviews --> phrase_stats --> podcast_phrase_stats[(Podcast Phrase Stats)]
|
|
podcast_phrase_stats & raw_reviews --> calc_summary --> podcast_daily_summary[(Podcast Daily Summary)]
|
|
```
|
|
|
|
## Input Data
|
|
|
|
Get it from [here](https://www.kaggle.com/datasets/thoughtvector/podcastreviews/versions/28?select=database.sqlite)! (and put it in `examples/podcast_reviews/data/ingest/database.sqlite`)
|