Which correlation and why? #3

TylerKirby · 2021-06-28T15:39:08Z

Determine which correlation metric you use in https://github.com/jacksteussie/DSCA/blob/main/Data%20Analysis%20with%20Python/medical-data-visualizer/medical_data_visualizer.py and justify your choice. Does your analysis change with different metrics? How should you choose a metric?

jacksteussie · 2021-06-28T21:20:21Z

I chose to use the Pearson correlation because the data fit the assumptions that the Pearson correlation requires to be accurate. Specifically, those are as follows: 1) the data is in the interval/ratio format, 2) the data is mostly linear, 3) we removed the outliers in the dataset, and 4) the data is normalized (tested with scipy's normal test with p < 0.02. The differences between the Spearman correlation and Pearson correlation are very small but the differences between Kendall's and the rest are quite noticeable, which would end up leading to a change in analysis. You should choose a metric based on the assumptions each metric has, which determines its usefulness/accuracy in any given situation (the better your data fits the assumptions, the more useful a certain type of correlation is going to be).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which correlation and why? #3

Which correlation and why? #3

TylerKirby commented Jun 28, 2021

jacksteussie commented Jun 28, 2021

Which correlation and why? #3

Which correlation and why? #3

Comments

TylerKirby commented Jun 28, 2021

jacksteussie commented Jun 28, 2021