05_matplotlib-challenge

Pymaceuticals.ipynb Completed Assignment

Background

You've just joined Pymaceuticals, Inc., a new pharmaceutical company that specializes in anti-cancer medications. Recently, it began screening for potential treatments for squamous cell carcinoma (SCC), a commonly occurring form of skin cancer.

As a senior data analyst at the company, you've been given access to the complete data from their most recent animal study. In this study, 249 mice who were identified with SCC tumors received treatment with a range of drug regimens. Over the course of 45 days, tumor development was observed and measured. The purpose of this study was to compare the performance of Pymaceuticals’ drug of interest, Capomulin, against the other treatment regimens.

The executive team has tasked you with generating all of the tables and figures needed for the technical report of the clinical study. They have also asked you for a top-level summary of the study results.

Requirements

Prepare the Data

Merge the datasets into a single DataFrame
Display the total number of mice from the merged DataFrame
Identify each duplicate mouse based on Mouse ID and Timepoint
Create a clean DataFrame by dropping duplicate mice
Display the total number of mice from the clean DataFrame

Generate Summary Statistics

Calculate the mean of the tumor volume for each regimen using groupby
Calculate the median of the tumor volume for each regimen using groupby
Calculate the variance of the tumor volume for each regimen using groupby
Calculate the standard deviation of the tumor volume for each regimen using groupby
Calculate the SEM of the tumor volume for each regimen using groupby
Create a new DataFrame to store the summary statistics

Create Bar Charts and Pie Charts

Generate a bar plot using Pandas to show the total number of timepoints for all mice tested for each drug regimen
Generate a bar plot using pyplot to show the total number of timepoints for all mice tested for each drug regimen
Generate a pie chart using Pandas to show the distribution of unique female vs. male mice
Generate a pie chart using pyplot to show the distribution of unique female vs. male mice

Calculate Quartiles, Find Outliers, and Create a Box Plot

Create a DataFrame with the last timepoint for each mouse ID using groupby
Reset the index of the DataFrame
Retrieve the maximum timepoint for each mouse
Define a list of the four treatment groups: Capomulin, Ramicane, Infubinol, and Ceftamin
Create an empty list to store tumor volume data
Use a for loop to display the interquartile range (IQR) and outliers for each treatment group
Generate a box plot showing the distribution of final tumor volume for all mice in each treatment group

Create a Line Plot and a Scatter Plot

Generate a line plot showing tumor volume vs. timepoint for one mouse treated with Capomulin
Generate a scatter plot showing average tumor volume vs. mouse weight for the Capomulin regimen

Calculate Correlation and Regression

Calculate the correlation coefficient and linear regression model for mouse weight and average tumor volume for the Capomulin regimen

##CODING_PROCESS

Overall Approach

I reviewed my notes, PowerPoint slides, and the past PyCitySchools challenge to help guide me through the assignment.
I looked up how to identify duplicate mice by Mouse ID and Timepoint, then reviewed the code to understand the methods used: duplicate_mice = complete_data_df[complete_data_df.duplicated(subset=["Mouse ID", "Timepoint"], keep=False)]["Mouse ID"].unique()

Summary Statistics

I refreshed my understanding of variance and standard deviation to better interpret these statistics. Standard deviation measures how far apart values are in a dataset and variance quantifies how much the values in the dataset vary from the mean. When the data points are close to the mean it suggests a high level of consistnacy within the data set.

Bar and Pie Charts Section

I asked the Xpert Learning Assistant and referenced activity materials heavily on this section to understand the differences between Pandas and Pyplot, as well as the methods for creating charts from a Series vs. a DataFrame. To illustrate my understanding, I practiced creating bar and pie charts from both a Series and a DataFrame.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Pymaceuticals		Pymaceuticals
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

05_matplotlib-challenge

Background

Requirements

Prepare the Data

Generate Summary Statistics

Create Bar Charts and Pie Charts

Calculate Quartiles, Find Outliers, and Create a Box Plot

Create a Line Plot and a Scatter Plot

Calculate Correlation and Regression

About

Releases

Packages

Languages

wrighang/05_matplotlib-challenge

Folders and files

Latest commit

History

Repository files navigation

05_matplotlib-challenge

Background

Requirements

Prepare the Data

Generate Summary Statistics

Create Bar Charts and Pie Charts

Calculate Quartiles, Find Outliers, and Create a Box Plot

Create a Line Plot and a Scatter Plot

Calculate Correlation and Regression

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages