Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which correlation and why? #3

Open
TylerKirby opened this issue Jun 28, 2021 · 1 comment
Open

Which correlation and why? #3

TylerKirby opened this issue Jun 28, 2021 · 1 comment

Comments

@TylerKirby
Copy link

Determine which correlation metric you use in https://github.com/jacksteussie/DSCA/blob/main/Data%20Analysis%20with%20Python/medical-data-visualizer/medical_data_visualizer.py and justify your choice. Does your analysis change with different metrics? How should you choose a metric?

@jacksteussie
Copy link
Owner

I chose to use the Pearson correlation because the data fit the assumptions that the Pearson correlation requires to be accurate. Specifically, those are as follows: 1) the data is in the interval/ratio format, 2) the data is mostly linear, 3) we removed the outliers in the dataset, and 4) the data is normalized (tested with scipy's normal test with p < 0.02. The differences between the Spearman correlation and Pearson correlation are very small but the differences between Kendall's and the rest are quite noticeable, which would end up leading to a change in analysis. You should choose a metric based on the assumptions each metric has, which determines its usefulness/accuracy in any given situation (the better your data fits the assumptions, the more useful a certain type of correlation is going to be).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants