Airbnb, Inc., based in San Francisco, California, operates an online marketplace focused on short-term homestays and experiences. I chose Airbnb as the topic of this project because of my recent Airbnb experiences in the United States as an international student. Airbnb’s allow guests to set preferences such as room type and property type when looking up specific rentals in a location. Apart from Airbnb guests’ preferences, there are certain aspects that determine the prices of the Airbnb’s, and these factors can be helpful in determining which areas or neighborhoods are better suited from a host’s perspective. In addition to this, Airbnb features a review system in which guests and hosts can rate and review each other after a stay. The truthfulness and impartiality of reviews may be adversely affected by concerns of future stays because prospective hosts may refuse to host a user who generally leaves negative reviews [2]. Thus, it would be useful to see which properties in different neighborhoods received good reviews in a month as that could help guests choose their rental. Thus, through this project, the overarching goal is to determine the attributes that are the most influential to the prices of listings of the Airbnb’s in the US. The analysis questions have been segregated as Airbnb guests’ preferences, Airbnb revenue generation, Airbnb Price & property, and Reviews by Airbnb guests. This will help target two to three specific questions within each category.
- Analyze and plot the number of listings based on their property type to see what people’s preferences are of renting an apartment, villa, or a house.
- Which rentals by location were most reviewed by Airbnb guests?
- Calculate the estimated revenue generated for each listing ID by multiplying the price of a property with the “minimum nights” column. What areas are best for bringing in the most money and can be recommended to potential hosts?
- Plot a bar plot to depict the distribution of the busiest months in terms of number of bookings by months and total estimated revenues generated by the hosts of various properties by months.
- The agenda is to find the variable with highest correlation with price. I will plot a pair plot of selected columns to find correlation of price of the various Airbnb’s and other factors such as bedrooms, bathrooms, review scores value, reviews per month and review scores accuracy. In addition, I will perform spearman correlation test on the data to confirm the correlation of price with one other variable having highest correlation.
- Fit a linear regression model to see how price changes based on the number of accommodates and see if there is any correlation between the two. 7. I will group the properties based on property type to calculate the mean price for each property type. The aim will be to find the most expensive property type by plotting a scatterplot to observe the same.
- In which year (2015, 2016 or 2017) were there the greatest average number of reviews for all the listings and to see the trend in these three years if the number of reviews provided by customers has increased over time?