Skip to content

Commit

Permalink
fixing gitignore
Browse files Browse the repository at this point in the history
  • Loading branch information
gottacatchenall committed Apr 19, 2024
1 parent 8466b46 commit 1974ed6
Show file tree
Hide file tree
Showing 4 changed files with 198 additions and 1 deletion.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
.vscode/*
Manifest.toml
docs/src/vignettes/*.md
50 changes: 50 additions & 0 deletions docs/src/vignettes/entropize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Getting the entropy matrix

For some applications, we want to place points to capture the maximum amount of
information, which is to say that we want to sample a balance of *entropy*
values, as opposed to *absolute* values. In this vignette, we will walk through
an example using the `entropize` function to convert raw data to entropy values.


```
using BiodiversityObservationNetworks
using NeutralLandscapes
using Plots
```

!!! warning "Entropy is problem-specific"
The solution presented in this vignette is a least-assumption solution based
on the empirical values given in a matrix of measurements. In a lot of
situations, this is not the entropy that you want. For example, if your pixels are storing probabilities of Bernoulli events, you can directly use the entropy of the events in the entropy matrix.

We start by generating a random matrix of measurements:

```
measurements = rand(MidpointDisplacement(), (200, 200)) .* 100
heatmap(measurements)
```

Using the `entropize` function will convert these values into entropy at the
pixel scale:

```
U = entropize(measurements)
heatmap(U')
```

The values closest to the median of the distribution have the highest entropy, and the values closest to its extrema have an entropy of 0. The entropy matrix is guaranteed to have values on the unit interval.

We can use `entropize` as part of a pipeline, and overlay the points optimized based on entropy on the measurement map:

```
locations =
measurements |> entropize |> seed(BalancedAcceptance(; numpoints = 100)) |> first
heatmap(U')
scatter!(
[x[1] for x in locations],
[x[2] for x in locations];
ms = 2.5,
mc = :white,
label = "",
)
```
88 changes: 88 additions & 0 deletions docs/src/vignettes/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# An introduction to BiodiversityObservationNetworks

In this vignette, we will walk through the basic functionalities of the package,
by generating a random uncertainty matrix, and then using a *seeder* and a
*refiner* to decide which locations should be sampled in order to gain more
insights about the process generating this entropy.

```
using BiodiversityObservationNetworks
using NeutralLandscapes
using Plots
```

In order to simplify the process, we will use the *NeutralLandscapes* package to
generate a 100×100 pixels landscape, where each cell represents the entropy (or
information content) in a unit we can sample:

```
U = rand(MidpointDisplacement(0.5), (100, 100))
heatmap(U'; aspectratio = 1, frame = :none, c = :lapaz)
```

In practice, this uncertainty matrix is likely to be derived from an application of the hyper-parameters optimization step, which is detailed in other vignettes.

The first step of defining a series of locations to sample is to use a
`BONSeeder`, which will generate a number of relatively coarse proposals that
cover the entire landscape, and have a balanced distribution in space. We do so
using the `BalancedAcceptance` sampler, which can be tweaked to capture more (or
less) uncertainty. To start with, we will extract 200 candidate points, *i.e.*
200 possible locations which will then be refined.


```
pack = seed(BalancedAcceptance(; numpoints = 200), U);
```

The output of a `BONSampler` (whether at the seeding or refinement step) is
always a tuple, storing in the first position a vector of `CartesianIndex`
elements, and in the second position the matrix given as input. We can have a
look at the first five points:

```
first(pack)[1:5]
```

Although returning the input matrix may seem redundant, it actually allows to
chain samplers together to build pipelines that take a matrix as input, and
return a set of places to sample as outputs; an example is given below.

The positions of locations to sample are given as a vector of `CartesianIndex`,
which are coordinates in the uncertainty matrix. Once we have generated a
candidate proposal, we can further refine it using a `BONRefiner` -- in this
case, `AdaptiveSpatial`, which performs adaptive spatial sampling (maximizing
the distribution of entropy while minimizing spatial auto-correlation).

```
candidates, uncertainty = pack
locations, _ = refine(candidates, AdaptiveSpatial(; numpoints = 50), uncertainty)
locations[1:5]
```


The reason we start from a candidate set of points is that some algorithms
struggle with full landscapes, and work much better with a sub-sample of them.
There is no hard rule (or no heuristic) to get a sense for how many points should be generated at the seeding step, and so experimentation is a must!

The previous code examples used a version of the `seed` and `refine` functions
that is very useful if you want to change arguments between steps, or examine
the content of the candidate pool of points. In addition to this syntax, both
functions have a curried version that allows chaining them together using pipes
(`|>`):

```
locations =
U |>
seed(BalancedAcceptance(; numpoints = 200)) |>
refine(AdaptiveSpatial(; numpoints = 50)) |>
first
```

This works because `seed` and `refine` have curried versions that can be used
directly in a pipeline. Proposed sampling locations can then be overlayed onto
the original uncertainty matrix:

```
plt = heatmap(U'; aspectratio = 1, frame = :none, c = :lapaz)
scatter!(plt, [x[1] for x in locations], [x[2] for x in locations], ms=2.5, mc=:white, label="")
```
60 changes: 60 additions & 0 deletions docs/src/vignettes/uniqueness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Selecting environmentally unique locations

For some applications, we want to sample a set of locations that cover a broad
range of values in environment space. Another way to rephrase this problem is to
say we want to find the set of points with the _least_ covariance in their
environmental values.

To do this, we use a `BONRefiner` called `Uniqueness`. We'll start by loading the required packages.

```
using BiodiversityObservationNetworks
using SpeciesDistributionToolkit
using StatsBase
using NeutralLandscapes
using Plots
```

!!! warning "Consider setting your SDMLAYERS_PATH" When accessing data using
`SimpleSDMDatasets.jl`, it is best to set the `SDM_LAYERSPATH` environmental
variable to tell `SimpleSDMDatasets.jl` where to download data. This can be
done by setting `ENV["SDMLAYERS_PATH"] = "/home/user/Data/"` or similar in
the `~/.julia/etc/julia/startup.jl` file. (Note this will be different
depending on where `julia` is installed.)

```
bbox = (left=-83.0, bottom=46.4, right=-55.2, top=63.7);
temp, precip, elevation =
convert(Float32, SimpleSDMPredictor(RasterData(WorldClim2, AverageTemperature); bbox...)),
convert(Float32, SimpleSDMPredictor(RasterData(WorldClim2, Precipitation); bbox...)),
convert(Float32, SimpleSDMPredictor(RasterData(WorldClim2, Elevation); bbox...));
```

Now we'll use the `stack` function to combine our four environmental layers into a single, 3-dimensional array, which we'll pass to our `Uniqueness` refiner.

```
layers = BiodiversityObservationNetworks.stack([temp,precip,elevation]);
```

this requires NeutralLandscapes v0.1.2

```
uncert = rand(MidpointDisplacement(0.8), size(temp), mask=temp);
heatmap(uncert, aspectratio=1, frame=:box)
```

Now we'll get a set of candidate points from a BalancedAcceptance seeder that has no bias toward higher uncertainty values.

```
candpts, uncert = uncert |> seed(BalancedAcceptance(numpoints=100, α=0.0));
```

Now we'll `refine` our `100` candidate points down to the 30 most environmentally unique.

```
finalpts, uncert = refine(candpts, Uniqueness(;numpoints=30, layers=layers), uncert)
heatmap(uncert)
scatter!([p[2] for p in candpts], [p[1] for p in candpts], fa=0.0, msc=:white, label="Candidate Points")
scatter!([p[2] for p in finalpts], [p[1] for p in finalpts], c=:dodgerblue, msc=:white, label="Selected Points")
```

0 comments on commit 1974ed6

Please sign in to comment.