-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
85ed156
commit c507c12
Showing
12 changed files
with
1,189 additions
and
446 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"hash": "6d4a326e162b1b3ebe7e9895a7472d8f", | ||
"result": { | ||
"markdown": "---\ntitle: \"Creating a Scree Plot in Base R\"\nauthor: \"Steven P. Sanderson II, MPH\"\ndate: \"2023-10-24\"\ncategories: [rtip, viz]\n---\n\n\n# Introduction\n\nA scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.\n\nIn this blog post, we will show you how to create a scree plot in base R. We will use the `iris` dataset as an example.\n\n# Step 1: Load the dataset and prepare the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Drop the non-numerical column\ndf <- iris[, -5]\n```\n:::\n\n\nE Step 2: Perform Principal Component Analysis\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Perform PCA on the iris dataset\npca <- prcomp(df, scale = TRUE)\n```\n:::\n\n\nE Step 3: Create the scree plot\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Extract the eigenvalues from the PCA object\neigenvalues <- pca$sdev^2\n\n# Create a scree plot\nplot(eigenvalues, type = \"b\",\n xlab = \"Principal Component\",\n ylab = \"Eigenvalue\")\n\n# Add a line at y = 1 to indicate the elbow\nabline(v = 2, col = \"red\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Percentage of variance explained\nplot(eigenvalues/sum(eigenvalues), type = \"b\",\n xlab = \"Principal Component\",\n ylab = \"Percentage of Variance Explained\")\nabline(v = 2, col = \"red\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-2.png){width=672}\n:::\n:::\n\n\n# Interpretation\n\nThe scree plot shows that the first two principal components explain the most variance in the data. The third and fourth principal components explain much less variance.\n\nBased on the scree plot, we can conclude that the first two principal components are sufficient for capturing the most important information in the data.\n\nHere are the eigenvalues and the percentage explained\n\n\n::: {.cell}\n\n```{.r .cell-code}\neigenvalues\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2.91849782 0.91403047 0.14675688 0.02071484\n```\n:::\n\n```{.r .cell-code}\neigenvalues/sum(eigenvalues)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.729624454 0.228507618 0.036689219 0.005178709\n```\n:::\n:::\n\n\n# Try it yourself\n\nTry creating a scree plot for another dataset of your choice. You can use the same steps outlined above.\n\nHere are some additional tips for creating scree plots:\n\n* If you are using a dataset with a large number of variables, you may want to consider scaling the data before performing PCA. This will ensure that all of the variables are on the same scale and that no one variable has undue influence on the results.\n* You can also add a line to the scree plot at y = 1 to indicate the elbow. The elbow is the point where the scree plot begins to level off. This is often used as a heuristic for determining the number of PCs to retain.\n* Finally, keep in mind that the interpretation of a scree plot is subjective. There is no single rule for determining the number of PCs to retain. The best approach is to consider the scree plot in conjunction with other factors, such as your research goals and the specific dataset you are using.", | ||
"supporting": [ | ||
"index_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
title: "Creating a Scree Plot in Base R" | ||
author: "Steven P. Sanderson II, MPH" | ||
date: "2023-10-24" | ||
categories: [rtip, viz] | ||
--- | ||
|
||
# Introduction | ||
|
||
A scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model. | ||
|
||
In this blog post, we will show you how to create a scree plot in base R. We will use the `iris` dataset as an example. | ||
|
||
# Step 1: Load the dataset and prepare the data | ||
|
||
```{r} | ||
# Drop the non-numerical column | ||
df <- iris[, -5] | ||
``` | ||
|
||
E Step 2: Perform Principal Component Analysis | ||
|
||
```{r} | ||
# Perform PCA on the iris dataset | ||
pca <- prcomp(df, scale = TRUE) | ||
``` | ||
|
||
E Step 3: Create the scree plot | ||
|
||
```{r} | ||
# Extract the eigenvalues from the PCA object | ||
eigenvalues <- pca$sdev^2 | ||
# Create a scree plot | ||
plot(eigenvalues, type = "b", | ||
xlab = "Principal Component", | ||
ylab = "Eigenvalue") | ||
# Add a line at y = 1 to indicate the elbow | ||
abline(v = 2, col = "red") | ||
# Percentage of variance explained | ||
plot(eigenvalues/sum(eigenvalues), type = "b", | ||
xlab = "Principal Component", | ||
ylab = "Percentage of Variance Explained") | ||
abline(v = 2, col = "red") | ||
``` | ||
|
||
# Interpretation | ||
|
||
The scree plot shows that the first two principal components explain the most variance in the data. The third and fourth principal components explain much less variance. | ||
|
||
Based on the scree plot, we can conclude that the first two principal components are sufficient for capturing the most important information in the data. | ||
|
||
Here are the eigenvalues and the percentage explained | ||
|
||
```{r} | ||
eigenvalues | ||
eigenvalues/sum(eigenvalues) | ||
``` | ||
|
||
# Try it yourself | ||
|
||
Try creating a scree plot for another dataset of your choice. You can use the same steps outlined above. | ||
|
||
Here are some additional tips for creating scree plots: | ||
|
||
* If you are using a dataset with a large number of variables, you may want to consider scaling the data before performing PCA. This will ensure that all of the variables are on the same scale and that no one variable has undue influence on the results. | ||
* You can also add a line to the scree plot at y = 1 to indicate the elbow. The elbow is the point where the scree plot begins to level off. This is often used as a heuristic for determining the number of PCs to retain. | ||
* Finally, keep in mind that the interpretation of a scree plot is subjective. There is no single rule for determining the number of PCs to retain. The best approach is to consider the scree plot in conjunction with other factors, such as your research goals and the specific dataset you are using. |