Skip to content

Design Principles

Matthias Grenié edited this page Oct 5, 2021 · 4 revisions

The following design principles clarify the philosophy behind funbiogeo.

Scope

The R package funbiogeo is meant to help scientists and practitioners who work in functional biogeography. To do so funbiogeo is designed to ease and automate steps that are generally reimplemented by members of this community. Hopefully this would contribute to improve the standardization of functional biogeography analyses.

Dependencies

The package funbiogeo does not provide any new numerical implementation of methods in functional ecology. Rather, funbiogeo orchestrate existing packages in order to make functional biogeography analyses easier. As such, funbiogeo would depend on several external packages. There are at least three types of package funbiogeo will depend on:

  1. packages that are used to read input data, e.g. raster and sf,
  2. packages that compute functional diversity indices, e.g. fundiversity or funrar,
  3. packages to visualize data such as ggplot2.

That being said, in order to ease the maintenance of funbiogeo, we should carefully consider packages listed as dependencies. For any package to be added, we should pay particular attention to:

  • whether the package to be added as a dependency is still being actively developed so that if it goes off the CRAN it may be restored quickly,
  • the number of direct and indirect dependencies this package brings,
  • the odds that this package is already installed on users' machines.

Those criteria will be relevant to select a package when multiple packages can be used to achieve similar goals.

Functions

Naming Conventions

Function names should use snake case (e.g. snake_case()): all characters should be lowercase letters separated by underscores.

To avoid name collision, we should prefix all functions names with fb_ as it is done by many packages now.

Arguments

As many functions will likely use similar arguments, they should be named consistently in funbiogeo. Also, when possible, these arguments should also be similarly ordered (this would lessen the burden on users by making functions and arguments easy to memorize).

Inputs

The basic data manipulated in funbiogeo are:

  • sites x species occurrence/abundance matrix,
  • species x traits matrix,
  • sites x environment matrix,
  • sites x locations object.

A note on (taxonomic and spatial) scales

Throughout the package, we refer to sites and species because this will likely be the most common use case. However funbiogeo does not actually enforce any specific definitions of taxonomic and spatial scales, funbiogeo is agnostic to the users' definition of this term. As such, entities referred as "species" could actually be sub-species, individuals, genotypes, or even entire communities depending on how the user define them. Similarly, "sites" do not refer to any particular spatial scale.

Basic objects, basic classes

For simple input data, the most standard R objects and classes should be used.

When relevant, input objects should be data.frame as they are certainly the most widespread object in R (very often use in ecology R package, e.g. vegan).

  • A sites x species matrix should be a data.frame (or a matrix) with sites as rows and species as columns. Sites names would be used as row names and species names as column names. One element of the site of this matrix describes the occurrence/abundance of a species in a given site. This matrix should contain only numeric values, more precisely:

    • 0 and 1 (only) if for occurrence data;
    • positive integers for abundance;
    • real values between 0 and 100 for cover in %;
    • any positive real values for basal area.
  • species x traits matrix should be data.frame (or a matrix) with species as rows and traits as columns. Species names would be used as row names and trait names as column names. The intersection between one column and one row gives the trait of a species. Note that not all traits may be represented by real values, some may be categorical.

  • sites x environment matrix should be data.frame (or a matrix) with sites as rows and environmental variables as columns.

Complex objects

Site-location objects or environmental layers could be more complex objects than simple data.frame and/or matrix. Only standard format should be used to defined them:

  • sites x locations objects are assumed to be sf objects (or coercible to sf such as sp objects) that define the location of sites, the first attribute column would define site names. These objects could be spatial points or spatial polygons to define the sites.
  • environmental layers are assumed to be raster objects that define the variation in a continuous environmental variable.

Internal representation

As funbiogeo manipulates similar information across all functions, it would tempting to develop a custom class to work with it. But custom classes can be cumbersome to use and can abstract away the details of the dataset to the users.

If at any point be need to define a classe in funbiogeo we should probably define an S3 class as it the simplest class system R. S4 classes are too rigid, and R6 classes are out of scope as too complex for funbiogeo.

Outputs

Most simple outputs types as possible

As little transformation as possible

Coding style

We are coding for humans and not machines, so the coding style should be as spaced as possible. We will be following the conventions from the tidyverse style guide