diff --git a/.gitignore b/.gitignore index 8d0f9f3..57ec627 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,2 @@ .vscode/* Manifest.toml -docs/src/vignettes/*.md \ No newline at end of file diff --git a/docs/src/vignettes/entropize.md b/docs/src/vignettes/entropize.md new file mode 100644 index 0000000..92cf207 --- /dev/null +++ b/docs/src/vignettes/entropize.md @@ -0,0 +1,50 @@ +# Getting the entropy matrix + +For some applications, we want to place points to capture the maximum amount of +information, which is to say that we want to sample a balance of *entropy* +values, as opposed to *absolute* values. In this vignette, we will walk through +an example using the `entropize` function to convert raw data to entropy values. + + +``` +using BiodiversityObservationNetworks +using NeutralLandscapes +using Plots +``` + +!!! warning "Entropy is problem-specific" + The solution presented in this vignette is a least-assumption solution based + on the empirical values given in a matrix of measurements. In a lot of + situations, this is not the entropy that you want. For example, if your pixels are storing probabilities of Bernoulli events, you can directly use the entropy of the events in the entropy matrix. + +We start by generating a random matrix of measurements: + +``` +measurements = rand(MidpointDisplacement(), (200, 200)) .* 100 +heatmap(measurements) +``` + +Using the `entropize` function will convert these values into entropy at the +pixel scale: + +``` +U = entropize(measurements) +heatmap(U') +``` + +The values closest to the median of the distribution have the highest entropy, and the values closest to its extrema have an entropy of 0. The entropy matrix is guaranteed to have values on the unit interval. + +We can use `entropize` as part of a pipeline, and overlay the points optimized based on entropy on the measurement map: + +``` +locations = + measurements |> entropize |> seed(BalancedAcceptance(; numpoints = 100)) |> first +heatmap(U') +scatter!( + [x[1] for x in locations], + [x[2] for x in locations]; + ms = 2.5, + mc = :white, + label = "", +) +``` \ No newline at end of file diff --git a/docs/src/vignettes/overview.md b/docs/src/vignettes/overview.md new file mode 100644 index 0000000..62aa48e --- /dev/null +++ b/docs/src/vignettes/overview.md @@ -0,0 +1,88 @@ +# An introduction to BiodiversityObservationNetworks + +In this vignette, we will walk through the basic functionalities of the package, +by generating a random uncertainty matrix, and then using a *seeder* and a +*refiner* to decide which locations should be sampled in order to gain more +insights about the process generating this entropy. + +``` +using BiodiversityObservationNetworks +using NeutralLandscapes +using Plots +``` + +In order to simplify the process, we will use the *NeutralLandscapes* package to +generate a 100×100 pixels landscape, where each cell represents the entropy (or +information content) in a unit we can sample: + +``` +U = rand(MidpointDisplacement(0.5), (100, 100)) +heatmap(U'; aspectratio = 1, frame = :none, c = :lapaz) +``` + +In practice, this uncertainty matrix is likely to be derived from an application of the hyper-parameters optimization step, which is detailed in other vignettes. + +The first step of defining a series of locations to sample is to use a +`BONSeeder`, which will generate a number of relatively coarse proposals that +cover the entire landscape, and have a balanced distribution in space. We do so +using the `BalancedAcceptance` sampler, which can be tweaked to capture more (or +less) uncertainty. To start with, we will extract 200 candidate points, *i.e.* +200 possible locations which will then be refined. + + +``` +pack = seed(BalancedAcceptance(; numpoints = 200), U); +``` + +The output of a `BONSampler` (whether at the seeding or refinement step) is +always a tuple, storing in the first position a vector of `CartesianIndex` +elements, and in the second position the matrix given as input. We can have a +look at the first five points: + +``` +first(pack)[1:5] +``` + +Although returning the input matrix may seem redundant, it actually allows to +chain samplers together to build pipelines that take a matrix as input, and +return a set of places to sample as outputs; an example is given below. + +The positions of locations to sample are given as a vector of `CartesianIndex`, +which are coordinates in the uncertainty matrix. Once we have generated a +candidate proposal, we can further refine it using a `BONRefiner` -- in this +case, `AdaptiveSpatial`, which performs adaptive spatial sampling (maximizing +the distribution of entropy while minimizing spatial auto-correlation). + +``` +candidates, uncertainty = pack +locations, _ = refine(candidates, AdaptiveSpatial(; numpoints = 50), uncertainty) +locations[1:5] +``` + + +The reason we start from a candidate set of points is that some algorithms +struggle with full landscapes, and work much better with a sub-sample of them. +There is no hard rule (or no heuristic) to get a sense for how many points should be generated at the seeding step, and so experimentation is a must! + +The previous code examples used a version of the `seed` and `refine` functions +that is very useful if you want to change arguments between steps, or examine +the content of the candidate pool of points. In addition to this syntax, both +functions have a curried version that allows chaining them together using pipes +(`|>`): + +``` +locations = + U |> + seed(BalancedAcceptance(; numpoints = 200)) |> + refine(AdaptiveSpatial(; numpoints = 50)) |> + first +``` + +This works because `seed` and `refine` have curried versions that can be used +directly in a pipeline. Proposed sampling locations can then be overlayed onto +the original uncertainty matrix: + +``` +plt = heatmap(U'; aspectratio = 1, frame = :none, c = :lapaz) +scatter!(plt, [x[1] for x in locations], [x[2] for x in locations], ms=2.5, mc=:white, label="") +``` \ No newline at end of file diff --git a/docs/src/vignettes/uniqueness.md b/docs/src/vignettes/uniqueness.md new file mode 100644 index 0000000..9e24a91 --- /dev/null +++ b/docs/src/vignettes/uniqueness.md @@ -0,0 +1,60 @@ +# Selecting environmentally unique locations + +For some applications, we want to sample a set of locations that cover a broad +range of values in environment space. Another way to rephrase this problem is to +say we want to find the set of points with the _least_ covariance in their +environmental values. + +To do this, we use a `BONRefiner` called `Uniqueness`. We'll start by loading the required packages. + +``` +using BiodiversityObservationNetworks +using SpeciesDistributionToolkit +using StatsBase +using NeutralLandscapes +using Plots +``` + +!!! warning "Consider setting your SDMLAYERS_PATH" When accessing data using + `SimpleSDMDatasets.jl`, it is best to set the `SDM_LAYERSPATH` environmental + variable to tell `SimpleSDMDatasets.jl` where to download data. This can be + done by setting `ENV["SDMLAYERS_PATH"] = "/home/user/Data/"` or similar in + the `~/.julia/etc/julia/startup.jl` file. (Note this will be different + depending on where `julia` is installed.) + +``` +bbox = (left=-83.0, bottom=46.4, right=-55.2, top=63.7); +temp, precip, elevation = + convert(Float32, SimpleSDMPredictor(RasterData(WorldClim2, AverageTemperature); bbox...)), + convert(Float32, SimpleSDMPredictor(RasterData(WorldClim2, Precipitation); bbox...)), + convert(Float32, SimpleSDMPredictor(RasterData(WorldClim2, Elevation); bbox...)); +``` + +Now we'll use the `stack` function to combine our four environmental layers into a single, 3-dimensional array, which we'll pass to our `Uniqueness` refiner. + +``` +layers = BiodiversityObservationNetworks.stack([temp,precip,elevation]); +``` + +this requires NeutralLandscapes v0.1.2 + +``` +uncert = rand(MidpointDisplacement(0.8), size(temp), mask=temp); +heatmap(uncert, aspectratio=1, frame=:box) +``` + +Now we'll get a set of candidate points from a BalancedAcceptance seeder that has no bias toward higher uncertainty values. + +``` +candpts, uncert = uncert |> seed(BalancedAcceptance(numpoints=100, α=0.0)); +``` + +Now we'll `refine` our `100` candidate points down to the 30 most environmentally unique. + +``` +finalpts, uncert = refine(candpts, Uniqueness(;numpoints=30, layers=layers), uncert) + +heatmap(uncert) +scatter!([p[2] for p in candpts], [p[1] for p in candpts], fa=0.0, msc=:white, label="Candidate Points") +scatter!([p[2] for p in finalpts], [p[1] for p in finalpts], c=:dodgerblue, msc=:white, label="Selected Points") +``` \ No newline at end of file