Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 996 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 996 Bytes

Web Scraper (Scrapy) - German Online Reviews/Ratings of Organic Coffee

This repository contains the web scraper I used to crawl the Utopia.de website to collect German-language online user reviews of organic/fair trade coffee.

The dataset is available on Kaggle: https://www.kaggle.com/mldado/german-online-reviewsratings-of-organic-coffee

Content

The scraper will collect the following data:

  • brand name of the coffee being reviewed
  • user rating of the coffee (1-5 stars)
  • user review in German

Inspiration

There aren't that many NLP datasets in German. This one is a little small, but should be enough to try out some sentiment analysis and other advancesd techniques like aspect-based sentiment analysis. It would be interesting to extract features that represent the preferences of German coffee drinkers, why they chose to buy organic/fair trade coffee brands over conventional ones, and maybe even find out what differentiates a 5-star coffee from 'just' a 4-star coffee.