Inferential stastic used to determine if there between the means of two groups/distribution.
It's of two types:
- One-sampled T-test - The test will tell us whether the means of the sample and the population are different.
- Two-sampled T-test - The test compares the means of two independent groups in order to determine whether their associated population means are different.
This is done to compare the samples from the same group. Because the samples are not independent, we cannot use the Student’s t-test. Instead, we must use a modified version of the test that checks for the fact that the data samples are dependent, called the paired Student’s t-test. In the example from the website, they have taken there samples to be independent, not paired, we can pretend for the sake of the demonstration that the observations are paired and calculate the statistic.
The purpose of a one-way analysis of variance (one-way ANOVA) is to compare the means of two or more groups (the independent variable) on one dependent variable to see if the group means are significantly different from each other. ANOVA is a statistical test that assumes that the mean across 2 or more groups are equal. If the evidence suggests that this is not the case, the null hypothesis is rejected and at least one data sample has a different distribution.
## Exploring Seaborn Plots
High level commands to plot useful for stastical data exploration and stastical model fitting. Seaborn API is more convenient than raw Matplolib commands
[Code Link](https://github.com/SourajitaDewasi/StasticalDataAnalysis/tree/main/Seaborn)
Histograms represent the data distribution by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin.
Kernel Density Estimation (KDE) is a way to estimate the probability density function of a continuous random variable. It is used for non-parametric analysis. Setting the hist flag to False in distplot will yield the kernel density estimation plot.
1.1 plot histograms and joint distributions of variables for stastical data visualization.
2.1 Useful for exploring correlations between multidimensional data to plot all pairs of values against each other.
3.1 View data via histograms of subsets.
4.1 Factor plots is used to view the distribution of a parameter within bins defined by any other parameter.
5.1 Show the joint distribution between different datasets, along with the associated marginal distributions.
6.1 Violin plots, Time Series Plots, JointPlot
## Exploring Pandas Visualization