Skip to content

Project to analyze reviews written as part of the Amazon Vine program. PySpark and Python are used to create dataframes and clean the raw data. Cleaned data is loaded into an AWS RDL for analysis. The relationships between paid and free reviews were then investigated.

Notifications You must be signed in to change notification settings

jbalooshie/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis

This repository was created as part of a 6 month Data Analystics Bootcamp administed by George Washington University. This is the repository for the Module 12 Challenge. This challenge built on SQL and Python skills while also introducing linking work to an AWS RDS. Topics covered including setting up an AWS RDS, using PySpark, and loading tables into AWS.

27 SEP 2022 - Updated repo to better organize files and add some context to README

Overview of the analysis

The purpose of this analysis is to analyze Amazon reviews written by members of the paid Amazon Vine program. Companies listing their products on Amazon can pay a fee to have their products provided to individuals who are required to write a review. These individuals do not pay for the products, so it would be helpful to know if this creates an incentive to write a positive review.

In this analysis, we take a dataset for a specific product category. We created several dataframes from the dataset to complete our analysis and uploaded them to a cloud database so it can be accessed by other stakeholders. From the dataframes, we drew conclusions, which are presented below.

Results

  • There were 170 Vine reviews and 37840 non-Vine reviews Vine vs non-Vine totals

  • There were 65 5 stars Vine reviews and 20612 5 stars non-Vine reviews. 5 stars Vine vs non-Vine

  • 38.2% of the Vine reviews were 5 stars and 54.5% of the non-Vine reviews were 5 stars. % of 5 stars Vine vs non-Vine

Summary

Potential for positivity bias

Based on the results of the analysis, there does not seem to be a pre-disposition for positivity bias amongst individuals writing Vine reviews. I found that 38.2% of the Vine reviews were 5 stars, while 54.5% of the non-Vine reviews were 5 stars. The main drawback to this analysis is the discrepancy in sample sizes. The number of non-Vine reviews is significantly larger than the number of Vine reviews (32,840 non-Vine vs 170 Vine reviews).

Additional analysis

One additional analysis that would be helpful here would be to perform the same investigation on another product category. This analysis was performed on a dataset of pet products. Selecting an unrelated dataset and performing the same analysis would allow us to see if there is a similar spread across both categories. If there are like ratios between Vine and non-Vine reviews in the second category, then that helps us control for the pet products category being an outlier.

About

Project to analyze reviews written as part of the Amazon Vine program. PySpark and Python are used to create dataframes and clean the raw data. Cleaned data is loaded into an AWS RDL for analysis. The relationships between paid and free reviews were then investigated.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published