This project was the capstone of the Google Data Analytics Certificate. I have decided to use Excel and R to complete this task. This was my first project using R.
You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.
To analyse Bellabeat's competitor's smart devices usage data in order to identify potential growth opportunities and recommedations for marketing strategy based on the trends found in the analysis.
Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
- What are some trends in smart device usage and how these trends apply to Bellabeat customers?
- How could these trends help influence Bellabeat marketing strategy and help gain new customers?
- The data comes from FitBit Fitness Tracker Data, stored on Kaggle, and contains total of 18 csv files. Dataset was generated by by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
- The licence is listed as a public domain.
- The data provided has few limitations. The Small sample size (33 participants)and the fact that the device was collecting data for just one month could mean that the data is biased and not accurately represents the whole population. I also think that it lacks important data about the users participating such as gender, current lifestyle and age.
- Reliability : LOW – dataset was collected from 30 individuals who have been tracked for just 1 month. There are missing key indicators such as gender, age and lifestyle
- Originality : LOW – third party data collected using Amazon Mechanical Turk.
- Comprehensive : MEDIUM – Multiple data frames with different information
- Current : MEDIUM – data is 7 years old. People tend to have their own habits which are not likely to change very fast, but it might not be the best reflection on current trends.
- Cited : HIGH – data collector and source is properly documented.
I have downloaded and opened each file in Excel.
- For every file the formatting of the date was changed from CUSTOM to Short date to ensure all dates are uniform
- I have also performed a quick check for duplicate rows by utilising the IF(COUNTIFS()) functions.
- The next steps were to obtain count of unique customer values by using COUNT(UNIQUE()) formulas and gathering the information on what data is stored in each file by using TEXTJOIN function.
- The information obtained by performing the 2 last steps was transferred to separate spreadsheet.
Upon closer look I have identified that the file dailyActivity_merged.csv contains the data from dailyCalories_merged.csv, dailyIntensities_merged.csv, and dailySteps_merged.csv therefore I won't be using those.
I had a look at the heartrate_seconds.csv file and although monitoring heart rate has plenty of benefits in day to day life as well as during activities , I won't be getting deeper into analysing this data set due to low number of users (only 7). It is worth noting that the popularity of such devices growing and more people are being aware that the monitoring of heart rate could be vital for ensuring good health. This could be a good feature to include in Bellabeat devices.
After performing basic cleaning and checking of data in Excel, I will now move to R.
- Importing the files I will be focusing on and renaming for simplification:
- Preview ech file to make sure it's been imported correctly
- The count of rows and columns looks correct, so I will move to pulling some key statistics to gain more insight
> setwd("/Users/agnie/Documents/project")
> daily_activity <- read_csv("dailyActivity.csv")
> daily_sleep <- read_csv("sleepDay.csv")
> weight_log <- read_csv("weightLog.csv")
> glimpse(daily_activity)
Rows: 940
Columns: 15
$ Id 1503960366, 1503960366, 1503960366, 150396036…
$ ActivityDate "4/12/2016", "4/13/2016", "4/14/2016", "4/15/…
$ TotalSteps 13162, 10735, 10460, 9762, 12669, 9705, 13019…
$ TotalDistance 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
$ TrackerDistance 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
$ LoggedActivitiesDistance 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ VeryActiveDistance 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
$ ModeratelyActiveDistance 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
$ LightActiveDistance 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
$ SedentaryActiveDistance 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ VeryActiveMinutes 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
$ FairlyActiveMinutes 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
$ LightlyActiveMinutes 328, 217, 181, 209, 221, 164, 233, 264, 205, …
$ SedentaryMinutes 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
$ Calories 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
> glimpse(daily_sleep)
Rows: 413
Columns: 5
$ Id 1503960366, 1503960366, 1503960366, 1503960366, 150…
$ SleepDay "4/12/2016", "4/13/2016", "4/15/2016", "4/16/2016",…
$ TotalSleepRecords 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ TotalMinutesAsleep 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
$ TotalTimeInBed 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
> glimpse(weight_log)
Rows: 67
Columns: 8
$ Id 1503960366, 1503960366, 1927972279, 2873212765, 2873212…
$ Date "5/2/2016", "5/3/2016", "4/13/2016", "4/21/2016", "5/12…
$ WeightKg 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, …
$ WeightPounds 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6…
$ Fat 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,…
$ BMI 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,…
$ IsManualReport TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
$ LogId 1.462234e+12, 1.462320e+12, 1.460510e+12, 1.461283e+12,…
- Pulling key statistics
+ select(TotalSteps, TotalDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes,
SedentaryMinutes, Calories, ActivityType) %>%
+ summary()
TotalSteps | TotalDistance | VeryActiveMinutes | FairlyActiveMinutes | LightlyActiveMinutes | SedentaryMinutes | Calories | ActivityType |
---|---|---|---|---|---|---|---|
Min. : 0 | Min. : 0.000 | Min. : 0.00 | Min. : 0.00 | Min. : 0.0 | Min. : 0.0 | Min. : 0 | Length:940 |
1st Qu.: 3790 | 1st Qu.: 2.620 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.:127.0 | 1st Qu.: 729.8 | 1st Qu.:1828 | Class :character |
Median : 7406 | Median : 5.245 | Median : 4.00 | Median : 6.00 | Median :199.0 | Median :1057.5 | Median :2134 | Mode :character |
Mean : 7638 | Mean : 5.490 | Mean : 21.16 | Mean : 13.56 | Mean :192.8 | Mean : 991.2 | Mean :2304 | |
3rd Qu.:10727 | 3rd Qu.: 7.713 | 3rd Qu.: 32.00 | 3rd Qu.: 19.00 | 3rd Qu.:264.0 | 3rd Qu.:1229.5 | 3rd Qu.:2793 | |
Max. :36019 | Max. :28.030 | Max. :210.00 | Max. :143.00 | Max. :518.0 | Max. :1440.0 | Max. :4900 |
- On average users burn 2304 thousand calories daily
- The mean of totals steps per day is 7638 steps which is a good amount of step, but under the 10 thousend steps recomended by CDC
- On average the total distance walked a day is 5.49
- On average the users spend 16:51 hours being senentary per day, which is really high and can cause a lot of health problems!
> daily_sleep %>%
+ select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
+ summary()
TotalSleepRecords | TotalMinutesAsleep | TotalTimeInBed |
---|---|---|
Min. :1.000 | Min. : 58.0 | Min. : 61.0 |
1st Qu.:1.000 | 1st Qu.:361.0 | 1st Qu.:403.0 |
Median :1.000 | Median :433.0 | Median :463.0 |
Mean :1.119 | Mean :419.5 | Mean :458.6 |
3rd Qu.:1.000 | 3rd Qu.:490.0 | 3rd Qu.:526.0 |
Max. :3.000 | Max. :796.0 | Max. :961.0 |
- On average users get 7:21 hours of sleep which is in recommended amount
- The total amount spent in bed on average is around 7:50 which indicates that the users don't tend to struggle with insomnia
> weight_log %>%
+ select(WeightKg, BMI) %>%
+ summary()
WeightKg | BMI |
---|---|
Min. : 52.60 | Min. :21.45 |
1st Qu.: 61.40 | 1st Qu.:23.96 |
Median : 62.50 | Median :24.39 |
Mean : 72.04 | Mean :25.19 |
3rd Qu.: 85.05 | 3rd Qu.:25.56 |
Max. :133.50 | Max. :47.54 |
- The average BMI for the users is 25.19 which means that on average users are obese. That said most professionals start to slowly drop BMI as a good indicator of health
- The average weight is 72.04 kg. Unfortunatelly this doesn't give us much insight as we missing key characteristincs such as gender, age and lifestyle
After looking closely to the data tracked, I feel like there is not enough information and there are a lot of key characteristic missing to gain good overview of the trends and user habits
- Creating plots
> plot1 <- ggplot(daily_activity_sleep, aes(x=SedentaryMinutes, y= TotalMinutesAsleep)) +
+ geom_point(color="purple") + geom_smooth(color="black") +
+ labs(title="Sedentary minutes in relation to sleeping time")
> plot2 <- ggplot(daily_activity, aes(x=TotalSteps, y= Calories)) +
+ geom_smooth(color="black") +
+ labs(title="Total Steps and Calories Burned relationship")
- Positive correlation between distanvce and calories burnt
> plot3 <- ggplot(daily_sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
+ geom_point(color="pink") + geom_smooth(color="black") +
+ labs(title="Time Asleep(min)and total time in bed relationship")
- This proves the observation made earlier
Bellabeat has a lot of potential to become a world class player in the smart device market. Their core values are a guarantee of a success.
based on the analysis of Fitbit users activity and performing a bit of reserch in current trends there are recomendations to help Belabeat reach their full potential
- Utilize social media - People love to take part in challanges and it makes them more accountabe
- Encourage users to stay active by sending notifications when they reach a milestones towards a bigger goal i.e 10 thosand steps a day
- Consider including advertisements of a helth and fitness products of other companies
- Consider a section in the app with tips about mindfulness, health and staying fit
- Enable notifications when user has been inactive for longer period of time
- Let people connect with each other to build bigger sense of community support system if worse day occur
- Build trust and reasure customers that their data is completly safe
- Focus on a Mindfulness features - over 3 million people uses meditation apps, it would be a smart move to include some basic meditation and relaxation techniques
- Make sure the use of the app and inputing relevant information is as simple and fast as it can be. Customers rely mostly on automated tracking and simplicity
- Focus on inclusivity
MEMBERSHIP IMPROVEMENTS
RESEARCH
https://www.gearhungry.com/best-heart-rate-monitor/#:~:text=Modern%20technology%20has%20introduced%20the%20innovative%20heart%20rate,realize%20the%20importance%20of%20living%20a%20healthy%20lifestyle.
https://www.healthworkscollective.com/top-5-advantages-of-a-heart-rate-monitor-for-workouts-and-daily-life/#:~:text=Top%205%20Advantages%20Of%20A%20Heart%20Rate%20Monitor,Some%20Heart%20Rate%20Monitors%20Offer%20Additional%20Features%20
http://www.bestbmicalculator.com/25/
https://www.idealhome.co.uk/news/smart-technology-home-concerns-198472
https://www.techgenyz.com/2018/12/12/how-smart-technology-impacting-environment/
https://www.forbes.com/sites/bernardmarr/2022/01/26/the-5-biggest-fitness-and-wellness-technology-trends-in-2022/?sh=2f64c09c7cad