Skip to content

Commit

Permalink
new post
Browse files Browse the repository at this point in the history
  • Loading branch information
spsanderson committed Oct 24, 2023
1 parent 85ed156 commit c507c12
Show file tree
Hide file tree
Showing 12 changed files with 1,189 additions and 446 deletions.
16 changes: 16 additions & 0 deletions _freeze/posts/2023-10-24/index/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "6d4a326e162b1b3ebe7e9895a7472d8f",
"result": {
"markdown": "---\ntitle: \"Creating a Scree Plot in Base R\"\nauthor: \"Steven P. Sanderson II, MPH\"\ndate: \"2023-10-24\"\ncategories: [rtip, viz]\n---\n\n\n# Introduction\n\nA scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.\n\nIn this blog post, we will show you how to create a scree plot in base R. We will use the `iris` dataset as an example.\n\n# Step 1: Load the dataset and prepare the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Drop the non-numerical column\ndf <- iris[, -5]\n```\n:::\n\n\nE Step 2: Perform Principal Component Analysis\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Perform PCA on the iris dataset\npca <- prcomp(df, scale = TRUE)\n```\n:::\n\n\nE Step 3: Create the scree plot\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Extract the eigenvalues from the PCA object\neigenvalues <- pca$sdev^2\n\n# Create a scree plot\nplot(eigenvalues, type = \"b\",\n xlab = \"Principal Component\",\n ylab = \"Eigenvalue\")\n\n# Add a line at y = 1 to indicate the elbow\nabline(v = 2, col = \"red\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Percentage of variance explained\nplot(eigenvalues/sum(eigenvalues), type = \"b\",\n xlab = \"Principal Component\",\n ylab = \"Percentage of Variance Explained\")\nabline(v = 2, col = \"red\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-2.png){width=672}\n:::\n:::\n\n\n# Interpretation\n\nThe scree plot shows that the first two principal components explain the most variance in the data. The third and fourth principal components explain much less variance.\n\nBased on the scree plot, we can conclude that the first two principal components are sufficient for capturing the most important information in the data.\n\nHere are the eigenvalues and the percentage explained\n\n\n::: {.cell}\n\n```{.r .cell-code}\neigenvalues\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 2.91849782 0.91403047 0.14675688 0.02071484\n```\n:::\n\n```{.r .cell-code}\neigenvalues/sum(eigenvalues)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.729624454 0.228507618 0.036689219 0.005178709\n```\n:::\n:::\n\n\n# Try it yourself\n\nTry creating a scree plot for another dataset of your choice. You can use the same steps outlined above.\n\nHere are some additional tips for creating scree plots:\n\n* If you are using a dataset with a large number of variables, you may want to consider scaling the data before performing PCA. This will ensure that all of the variables are on the same scale and that no one variable has undue influence on the results.\n* You can also add a line to the scree plot at y = 1 to indicate the elbow. The elbow is the point where the scree plot begins to level off. This is often used as a heuristic for determining the number of PCs to retain.\n* Finally, keep in mind that the interpretation of a scree plot is subjective. There is no single rule for determining the number of PCs to retain. The best approach is to consider the scree plot in conjunction with other factors, such as your research goals and the specific dataset you are using.",
"supporting": [
"index_files"
],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
482 changes: 260 additions & 222 deletions docs/index.html

Large diffs are not rendered by default.

398 changes: 176 additions & 222 deletions docs/index.xml

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/listings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
{
"listing": "/index.html",
"items": [
"/posts/2023-10-24/index.html",
"/posts/2023-10-23/index.html",
"/posts/2023-10-20/index.html",
"/posts/2023-10-19/index.html",
Expand Down
653 changes: 653 additions & 0 deletions docs/posts/2023-10-24/index.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 8 additions & 1 deletion docs/search.json

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
</url>
<url>
<loc>https://www.spsanderson.com/steveondata/index.html</loc>
<lastmod>2023-10-23T12:35:35.806Z</lastmod>
<lastmod>2023-10-24T12:16:58.676Z</lastmod>
</url>
<url>
<loc>https://www.spsanderson.com/steveondata/posts/rtip-2023-04-06/index.html</loc>
Expand Down Expand Up @@ -512,4 +512,8 @@
<loc>https://www.spsanderson.com/steveondata/posts/2023-10-23/index.html</loc>
<lastmod>2023-10-23T12:35:43.626Z</lastmod>
</url>
<url>
<loc>https://www.spsanderson.com/steveondata/posts/2023-10-24/index.html</loc>
<lastmod>2023-10-24T12:17:18.506Z</lastmod>
</url>
</urlset>
70 changes: 70 additions & 0 deletions posts/2023-10-24/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: "Creating a Scree Plot in Base R"
author: "Steven P. Sanderson II, MPH"
date: "2023-10-24"
categories: [rtip, viz]
---

# Introduction

A scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.

In this blog post, we will show you how to create a scree plot in base R. We will use the `iris` dataset as an example.

# Step 1: Load the dataset and prepare the data

```{r}
# Drop the non-numerical column
df <- iris[, -5]
```

E Step 2: Perform Principal Component Analysis

```{r}
# Perform PCA on the iris dataset
pca <- prcomp(df, scale = TRUE)
```

E Step 3: Create the scree plot

```{r}
# Extract the eigenvalues from the PCA object
eigenvalues <- pca$sdev^2
# Create a scree plot
plot(eigenvalues, type = "b",
xlab = "Principal Component",
ylab = "Eigenvalue")
# Add a line at y = 1 to indicate the elbow
abline(v = 2, col = "red")
# Percentage of variance explained
plot(eigenvalues/sum(eigenvalues), type = "b",
xlab = "Principal Component",
ylab = "Percentage of Variance Explained")
abline(v = 2, col = "red")
```

# Interpretation

The scree plot shows that the first two principal components explain the most variance in the data. The third and fourth principal components explain much less variance.

Based on the scree plot, we can conclude that the first two principal components are sufficient for capturing the most important information in the data.

Here are the eigenvalues and the percentage explained

```{r}
eigenvalues
eigenvalues/sum(eigenvalues)
```

# Try it yourself

Try creating a scree plot for another dataset of your choice. You can use the same steps outlined above.

Here are some additional tips for creating scree plots:

* If you are using a dataset with a large number of variables, you may want to consider scaling the data before performing PCA. This will ensure that all of the variables are on the same scale and that no one variable has undue influence on the results.
* You can also add a line to the scree plot at y = 1 to indicate the elbow. The elbow is the point where the scree plot begins to level off. This is often used as a heuristic for determining the number of PCs to retain.
* Finally, keep in mind that the interpretation of a scree plot is subjective. There is no single rule for determining the number of PCs to retain. The best approach is to consider the scree plot in conjunction with other factors, such as your research goals and the specific dataset you are using.

0 comments on commit c507c12

Please sign in to comment.