Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum density can be zero #24

Open
Aariq opened this issue Feb 8, 2024 · 1 comment
Open

Minimum density can be zero #24

Aariq opened this issue Feb 8, 2024 · 1 comment

Comments

@Aariq
Copy link

Aariq commented Feb 8, 2024

It seems like with large, skewed datasets the density estimate for a point can be exactly zero. This doesn't make sense to me, since all the points should represent some data. It also presents a technical issue if I, say, wanted to log-transform the color scale.

library(ggplot2)
library(ggpointdensity)
df <- data.frame(x = c(rep(0, 100000), rnorm(100000)),
                 y = c(rep(0, 100000), rnorm(100000)))
p <- ggplot(df, aes(x = x, y = y)) +
  geom_pointdensity()
p
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)

p + scale_color_continuous(trans = "log10")
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)
#> Warning: Transformation introduced infinite values in discrete y-axis

Created on 2024-02-08 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       macOS Sonoma 14.2.1
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Phoenix
#>  date     2024-02-08
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  cli              3.6.2      2023-12-11 [1] CRAN (R 4.3.0)
#>  colorspace       2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
#>  curl             5.2.0      2023-12-08 [1] CRAN (R 4.3.0)
#>  digest           0.6.34     2024-01-11 [1] CRAN (R 4.3.0)
#>  dplyr            1.1.4      2023-11-17 [1] CRAN (R 4.3.0)
#>  evaluate         0.23       2023-11-01 [1] CRAN (R 4.3.0)
#>  fansi            1.0.6      2023-12-08 [1] CRAN (R 4.3.0)
#>  farver           2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
#>  fastmap          1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
#>  fs               1.6.3      2023-07-20 [1] CRAN (R 4.3.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
#>  ggplot2        * 3.4.4      2023-10-12 [1] CRAN (R 4.3.0)
#>  ggpointdensity * 0.1.0      2024-02-01 [1] Github (LKremer/ggpointdensity@02f3ab2)
#>  glue             1.7.0      2024-01-09 [1] CRAN (R 4.3.0)
#>  gtable           0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
#>  highr            0.10       2022-12-22 [1] CRAN (R 4.3.0)
#>  htmltools        0.5.7      2023-11-03 [1] CRAN (R 4.3.0)
#>  knitr            1.45       2023-10-30 [1] CRAN (R 4.3.0)
#>  labeling         0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
#>  lifecycle        1.0.4      2023-11-07 [1] CRAN (R 4.3.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
#>  MASS             7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.0)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
#>  pillar           1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr            1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache          0.16.0     2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3      1.8.2      2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo             1.25.0     2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils          2.12.2     2022-11-11 [1] CRAN (R 4.3.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang            1.1.3      2024-01-10 [1] CRAN (R 4.3.0)
#>  rmarkdown        2.25       2023-09-18 [1] CRAN (R 4.3.0)
#>  rstudioapi       0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
#>  scales           1.3.0      2023-11-28 [1] CRAN (R 4.3.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
#>  styler           1.10.2     2023-08-29 [1] CRAN (R 4.3.0)
#>  tibble           3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect       1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8             1.2.4      2023-10-22 [1] CRAN (R 4.3.0)
#>  vctrs            0.6.5      2023-12-01 [1] CRAN (R 4.3.0)
#>  withr            3.0.0      2024-01-16 [1] CRAN (R 4.3.0)
#>  xfun             0.41       2023-11-01 [1] CRAN (R 4.3.0)
#>  xml2             1.3.5      2023-07-06 [1] CRAN (R 4.3.0)
#>  yaml             2.3.8      2023-12-11 [1] CRAN (R 4.3.0)
#> 
#>  [1] /Users/ericscott/Library/R/x86_64/4.3/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@Aariq
Copy link
Author

Aariq commented Feb 9, 2024

This is maybe related to the default bandwidth estimator used by MASS::k2de(). If I supply my own values of h using a different bandwidth estimator (e.g. bw.nrd0()) I don't have this issue or the issue with bandwith == 0 (#21). Even the documentation says that bw.nrd() "has remained the default for historical and compatibility reasons, rather than as a general recommendation". Perhaps it would be better for stat_pointdensity() to calculate its own bandwidth rather than relying on the defaults for k2de()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant