Skip to content

The Datasaurus Dozen

Thomas Lin Pedersen edited this page Sep 4, 2018 · 4 revisions

submitted by Tom Westlake

The Datasaurus Dozen is a playful twist on Anscombe's Quartet. A group of twelve datasets, with nigh-identical summary statistics, yet when plotted on a graph they prove to be distinctly dissimilar.

The animation below, utilising the datasauRus, ggplot2 and gganimate packages, highlights the dangers of relying solely on summary statistics without considering the whole distribution

The Code

library(datasauRus)
library(ggplot2)
library(gganimate)

ggplot(datasaurus_dozen, aes(x=x, y=y))+
  geom_point()+
  theme_minimal() +
  transition_states(dataset, 3, 1) + 
  ease_aes('cubic-in-out')

datasaurus

Source

Reanimating the Datasaurus