Benchmarking with GEE: example with Wolf replication #144

fBedecarrats · 2023-03-10T14:57:37Z

fBedecarrats
Mar 10, 2023

TL;DR: The paper by Wolf et al. published in 2021 in Nature Ecolology and Evolution represent I think the state of the art that we are trying to improve with this package. I am currently working on a replication of this study (see work in progress here) I found that the original code cannot run (see issues on the repo where the code is published) so I re-wrote everything. I think that we could use the code written for this replication to cross-validate result accuracy and benchmark performance with mapme.biodiversity. It could also inform best practices for PA impact estimation too (although that speaks more to the work hosted on the KfW repo for reproducible workflows. I'll try to sumarize here the insights we could glean from this ongoing replication.

Background: For the initial study, the authors used GEE to resize and fetch raster files (elevation, population density, time travel and GFC cover, loss, lossyear and gain)., to later on process it with different scripts combining python, R and Julia to prepare the data. There is at least one coding typo so the original code cannot run as is. The code used deprecated version of several packages, but even fetching older version, I get cryptic errors when running the preparation + matching script in Julia. I contacted the author, but he replied that he didn't know what the error message went and was not able to locate the package/software version used for the calculations of the initial study.

Data computation on GEE: I re-implemented the processing workflow by Wolf et al, but using GEE more extensively, thanks to the {rgee} package that interfaces with GEE API. The code is here. It runs in less than an hour and spits out rasters per country, that is small enough to be ingested by a normal computer.

Take-aways for the Mapme effort:
I think there are a few strong points that we could incorporate in our current work:

Wolf et al. use WWF ecoregions + Curtis et al. deforestation drivers as matching variables, which I think is very sound. The mapme.biodiversity packages computes ecoregions, but I think that it is not used as a matching variable in KfW analysis. I think that it would be a good idea to do it. I think that the deforestation drivers are not computed by the package (this could be a feature request I think) and I think it is not included in KfW portfolio analysis either.
I think there are important limitations in Wolf et al. that the package mapme.biodiversity is designed to avoid them (in particular 1, 3 and 4.):

I think that Wolf et al. do not compute adequately the deforestation years. They downscale the pixels to 1km2 and they assign only one lossyear to the aggregated pixel, which is the mode of the underlying pixel. I fear that this alters not only the precision of outcome measurement, but also induces a biais, as it tends to lag the detected deforestation and therefore might under-estimate PA protection impact.
Wolf et. al only select PAs that have a value of 0 in the field MARINE of WDPA. In other words, they elude the coastal areas (despite having mangroves among their biomes of interest).
They substract forest loss with forest gains, but gains are only computed until 2012 and I think this was for a good reson: When the GFC V1 came out, there were strong critic that "forest cover gain" was actually "forest". Besides, it tends to minimise deforestation before 2013 compared to after, and therefore contribute to further under-estimation of PA protection impact.
I am not sure yet on this, but they seem to reproject after dowscaling resolution and calculating slope (which is frowned upon in GEE documentation). I wonder if this might not be a problem also.

We could use the refactored code available on this page to compute results with GEE on the same area than mapme.initiative to cross validate the accuracy of the results.
The technical implementation of the processing workflow could be useful for our reflexions on performance (An overview of potential avenues for performance enhancement #139):
- pixel-based vs. polygon-based aggregation for large coverage with small AOIs?
- leveraging cloud computing for the heavy lifting at the beginning of the process (e.g. downscaling GFC)
- benchmarking performance to have a notion of the level of processing where mapme.biodiversity is not competitive anymore compared to cloud-based alternatives.

Sorry for the long thread, it's Friday.

fBedecarrats · 2023-03-10T14:59:44Z

fBedecarrats
Mar 10, 2023
Author

@Shirobakaidou just sent me a snippet that helps computing deforestation per year on GEE. I'll incorporate it on this code to facilitate comparison of GEE vs. mapme for annual deforestation. Thanks @Shirobakaidou!

0 replies

Jo-Schie · 2023-03-10T17:15:04Z

Jo-Schie
Mar 10, 2023
Maintainer

awesome analysis. Helps us to advance in several aspects and we should discuss them in detail @melvinhlwong. Some spontaneous thoughts:

Totally agree with using biomes as matching variable (ecoregions is too detailed and would not allow to find matching pairs). We initially had that and them for some reasons excluded it again.
I am not sure about the forest cover loss drivers because to my remembering they only cover 2015 to 2020 or something similar. What I would find more interesting is analyzing land cover change in loss areas which is similar but we might find a source which includes earlier years or is still updated such as "dynamic world". In addition such data is also relevant to approximately separate natural losses from anthropogenic conversion which is the relevant distinction from my viewpoint and would allow ongoing use eg in the context of monitoring. I already included that in the wiki about new indicators.
I agree with your assessment on Wolf methodology regarding downscaling and using mode which introduces bias.
Your trouble of reproducing their analysis is a good argument for the package and could be used in a publication.

0 replies

Jo-Schie · 2023-03-10T17:22:08Z

Jo-Schie
Mar 10, 2023
Maintainer

Ok. Just saw that Curtis study maps drivers from 2001-2015 so we might consider them ... but it's a pity that they are not ongoing...

0 replies

fBedecarrats · 2023-03-10T17:44:53Z

fBedecarrats
Mar 10, 2023
Author

Ok. Just saw that Curtis study maps drivers from 2001-2015 so we might consider them ... but it's a pity that they are not ongoing...

Wait for my proposal on Monday! (it's already on the concept board) :-)

0 replies

goergen95 · 2023-04-17T07:14:49Z

goergen95
Apr 17, 2023
Maintainer

I think this thread is better suited in the discussion section? Or is there a direct link to an issue with the package that I am currently missing?

0 replies

Jo-Schie · 2023-04-17T19:35:58Z

Jo-Schie
Apr 17, 2023
Maintainer

Yeah I think that too. Is it possible to convert?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking with GEE: example with Wolf replication #144

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Benchmarking with GEE: example with Wolf replication #144

fBedecarrats Mar 10, 2023

Replies: 6 comments

fBedecarrats Mar 10, 2023 Author

Jo-Schie Mar 10, 2023 Maintainer

Jo-Schie Mar 10, 2023 Maintainer

fBedecarrats Mar 10, 2023 Author

goergen95 Apr 17, 2023 Maintainer

Jo-Schie Apr 17, 2023 Maintainer

fBedecarrats
Mar 10, 2023

fBedecarrats
Mar 10, 2023
Author

Jo-Schie
Mar 10, 2023
Maintainer

Jo-Schie
Mar 10, 2023
Maintainer

fBedecarrats
Mar 10, 2023
Author

goergen95
Apr 17, 2023
Maintainer

Jo-Schie
Apr 17, 2023
Maintainer