-
Notifications
You must be signed in to change notification settings - Fork 1
Design Principles
The following design principles clarify the philosophy behind funbiogeo
.
The R package funbiogeo
is meant to help scientists and practitioners who work in functional biogeography. To do so funbiogeo
is designed to ease and automate steps that are generally reimplemented by members of this community. Hopefully this would contribute to improve the standardization of functional biogeography analyses.
The package funbiogeo
does not provide any new numerical implementation of methods in functional ecology. Rather, funbiogeo
orchestrate existing packages in order to make functional biogeography analyses easier. As such, funbiogeo
would depend on several external packages. There are at least three types of package funbiogeo
will depend on:
- packages that are used to read input data, e.g.
raster
andsf
, - packages that compute functional diversity indices, e.g.
fundiversity
orfunrar
, - packages to visualize data such as
ggplot2
.
That being said, in order to ease the maintenance of funbiogeo
, we should carefully consider packages listed as dependencies. For any package to be added, we should pay particular attention to:
- whether the package to be added as a dependency is still being actively developed so that if it goes off the CRAN it may be restored quickly,
- the number of direct and indirect dependencies this package brings,
- the odds that this package is already installed on users' machines.
Those criteria will be relevant to select a package when multiple packages can be used to achieve similar goals.
Function names should use snake case (e.g. snake_case()
): all characters should be lowercase letters separated by underscores.
To avoid name collision, we should prefix all functions names with fb_
as it is done by many packages now.
As many functions will likely use similar arguments, they should be named
consistently in funbiogeo
. Also, when possible, these arguments should also be
similarly ordered (this would lessen the burden on users by making functions
and arguments easy to memorize).
The basic data manipulated in funbiogeo
are:
- sites x species occurrence/abundance matrix,
- species x traits matrix,
- sites x environment matrix,
- sites x locations object.
Throughout the package, we refer to sites and species because this will likely be the most common use case. However funbiogeo
does not actually enforce any specific definitions of taxonomic and spatial scales, funbiogeo
is agnostic to the users' definition of this term. As such, entities referred as "species" could actually be sub-species, individuals, genotypes, or even entire communities depending on how the user define them. Similarly, "sites" do not refer to any particular spatial scale.
For simple input data, the most standard R objects and classes should be used.
When relevant, input objects should be data.frame
as they are certainly the most widespread object in R (very often use in ecology R package, e.g. vegan
).
-
A sites x species matrix should be a
data.frame
(or amatrix
) with sites as rows and species as columns. Sites names would be used as row names and species names as column names. One element of the site of this matrix describes the occurrence/abundance of a species in a given site. This matrix should contain only numeric values, more precisely:- 0 and 1 (only) if for occurrence data;
- positive integers for abundance;
- real values between 0 and 100 for cover in %;
- any positive real values for basal area.
-
species x traits matrix should be
data.frame
(or amatrix
) with species as rows and traits as columns. Species names would be used as row names and trait names as column names. The intersection between one column and one row gives the trait of a species. Note that not all traits may be represented by real values, some may be categorical. -
sites x environment matrix should be
data.frame
(or amatrix
) with sites as rows and environmental variables as columns.
Site-location objects or environmental layers could be more complex objects than simple data.frame
and/or matrix
. Only standard format should be used to defined them:
-
sites x locations objects are assumed to be
sf
objects (or coercible tosf
such assp
objects) that define the location of sites, the first attribute column would define site names. These objects could be spatial points or spatial polygons to define the sites. -
environmental layers are assumed to be
raster
objects that define the variation in a continuous environmental variable.
As funbiogeo
manipulates similar information across all functions, it would tempting to develop a custom class to work with it.
But custom classes can be cumbersome to use and can abstract away the details of the dataset to the users.
If at any point be need to define a classe in funbiogeo
we should probably define an S3 class as it the simplest class system R. S4 classes are too rigid, and R6 classes are out of scope as too complex for funbiogeo
.
We are coding for humans and not machines, so the coding style should be as spaced as possible. We will be following the conventions from the tidyverse style guide