Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warper: add a EXCLUDED_VALUES warping option, and use it in gdal2tiles #9631

Merged
merged 2 commits into from
Apr 17, 2024

Conversation

rouault
Copy link
Member

@rouault rouault commented Apr 5, 2024

  • Warper: add a EXCLUDED_VALUES warping option to specify pixel values to be ignored as contributing source pixels during resampling. Limited to average resampling for now
  • gdal2tiles: add --excluded-values and --excluded-values-pct-threshold switches

@rouault rouault added this to the 3.9.0 milestone Apr 5, 2024
@dbaston
Copy link
Member

dbaston commented Apr 5, 2024

It would be an annoying change, but is IGNORE_VALUES an accurate description of what is happening here?

Could the same result be obtained by setting the affected pixels to NODATA ?

@rouault
Copy link
Member Author

rouault commented Apr 5, 2024

but is IGNORE_VALUES an accurate description of what is happening here?

naming is hard. I first though about INVALID_VALUES but was worried about the confusion with nodata. Wouldn't IGNORE_VALUES imply that they are not going to be found in the output? Because this is not the case here: depending on the relative weight of special values among the source pixels compared to the threshold, they can be either ignored, or selected.

Could the same result be obtained by setting the affected pixels to NODATA ?

In the use case this tries to solve (mapping of suitability index for sustainable agriculture), nodata/alpha is used for other types of pixels. Basically you've a continuous set of 100 suitability categories, a special value indicating unsuitable (that cannot be averaged with valid suitability categories), and transparent/nodata pixels.

@jratike80
Copy link
Collaborator

Feels like a valid use case, even users are already confused with nodata, alpha, and mask. What else could we add to the mix? Perhaps several mask bands, and a possibility to utilize all 256 values of alpha?

@rouault
Copy link
Member Author

rouault commented Apr 5, 2024

What else could we add to the mix? Perhaps several mask bands, and a possibility to utilize all 256 values of alpha?

I'm not sure how to interpret your comment. Do you feel this PR is appropriate? I acknowledge this new functionality is fairly specialized and it is easy to get confused by all the nodata and related concepts... And users/use cases don't always agree what to do when you compute a target pixel from a set of source pixels: in some cases, you want that if any source pixel is "invalid", the result to be invalid too (pessismistic use case / safety related), in other use cases you just want to ignore them (for "visual" / "good looking" use cases), and in some cases, you want something in the middle (ignore them if they are a minority of contributing pixels, but as soon as they are a majority, select them). Hence this threshold setting. That's something that could actually be generalized to how we process "usual" nodata (I believe we aren't always consistent within GDAL depending if you are in the overview or warping code regarding that), but I didn't go that far...

@jratike80
Copy link
Collaborator

I think that the PR is appropriate. The concept of threshold is interesting and kind of new to me.

It is sometimes difficult to help users to deal with their nodata needs even with the existing alternatives and I was mainly thinking about how it will be in the future. Not complaining at all. And then I was remembering that some satellite products deliver cloud masks as separate layers, and maybe the masks are not the same for all bands, like thermal infra-red being less sensitive to the shadows of the clowds than visible bands. Maybe we will get more feature requests in the future. Unfortunately dealing with nodata/unusable data/invalid/not relevant data is not as simple as it may feel at the first sight.

@dbaston
Copy link
Member

dbaston commented Apr 5, 2024

It seems pretty specific to one use case, and I wonder about other use cases with a different concept of special pixels and how they should be handled. Is it feasible to have a system where GDAL reads the contributing pixels and their respective weights, and the user provides a function to compute the target pixel values?

@rouault
Copy link
Member Author

rouault commented Apr 5, 2024

Is it feasible to have a system where GDAL reads the contributing pixels and their respective weights, and the user provides a function to compute the target pixel values?

In theory, I guess so, but that would require more extensive changes, and that would be much less convenient for end users as they would have to code it, and find a way to get it registered in their GDAL process. They might not be C++ devs, or if they code it in Python, the performance of going C++ -> Python for each target pixel might be terrible

@jratike80
Copy link
Collaborator

@sgillies
Copy link
Contributor

sgillies commented Apr 6, 2024

@rouault is there no other option for the customer than having a suitability index-specific warp option?

What does a user of this expect in the output? If we're averaging 4 pixels and one is unsuitable, do we compute the average of 3 pixels? Wouldn't this have statistical implications for downstream uses of the data?

At the least, can we reframe this as "exclusion" of values or something like that, instead of special-ness? Certainly there must be a good name for this type of analysis, and maybe even some published work on it.

@rouault
Copy link
Member Author

rouault commented Apr 6, 2024

is there no other option for the customer than having a suitability index-specific warp option?

the end goal is to generate a tileset with gdal2tiles, and get better results with the low zoom levels than with the current resampling algorithms. As this algorithm must handle the excluded pixels on (R,G,B) triplets, it was more appropriate to use the warping resampling logic as it operates on all bands at the same time. The overview resampling logic works band-per-band instead, and thus this could lead to wrong results if it happened that let's say a R value would be used by a suitable pixel and a unsuitable one.

What does a user of this expect in the output? If we're averaging 4 pixels and one is unsuitable, do we compute the average of 3 pixels?

The --special-value-pct-threshold is there to control on how exactly unsuitable pixels must be taken into account. From preliminary tests, it seems that the 100% value leads to the expected results. At least experments with 50% do not lead to expected results, as at the lowest zoom levels, unsuitable pixels are over represented, hiding the information about the suitable pixels. At 100%, unsuitable pixels are ignored in the weighted average, unless they represent all the source pixels.

Wouldn't this have statistical implications for downstream uses of the data?

I believe the intended use case here is visualization, so statistical exactness is probably not that important.

By the way, I'd be interested if someone has pointers in litterature on the "correct" way of performing the average/bilinear/bicubic/etc. resampling in presence of invalid/nodata pixels, but I'm not sure if there's one. The interpolation algorithms tend to consider the field to interpolate to have C0, C1 or C2 smoothness/continuity, and thus nodata pixels are probably the equivalent of a discontinuity, thus breaking the basic mathematical assumptions behind the formula. As I mentionned above, there are use cases where you prefer nodata pixels to "stop" interpolation as soon as being found, others where you ignore them, and intermediate ones where you ignore them until some threshold when they stop interpolation.

I sort of wonder if the customer should avoid resampling such index values in their application.

They have also experimented with the mode resampling, but it lacked the averaging effect in zones with suitable pixels, and behave a bit more like nearest neighbour. This PR offers something that is a kind of compromise between mode and average resampling

can we reframe this as "exclusion" of

yes, EXCLUDED_VALUES would work for me.

@sgillies
Copy link
Contributor

sgillies commented Apr 6, 2024

This PR offers something that is a kind of compromise between mode and average resampling

Ah, cool. I had just edited my previous comment to say "Certainly there must be a good name for this type of analysis, and maybe even some published work on it." I understand what this is now, thanks you!

What you would think about adding a new resampling algorithm to the enum for this special case so that specialness doesn't leak out in unintended ways?

@rouault
Copy link
Member Author

rouault commented Apr 6, 2024

What you would think about adding a new resampling algorithm to the enum for this special case so that specialness doesn't leak out in unintended ways?

To me, it feels more like a submode of average, and you have to explicitly enable it (not sure what you meant by "leak out"?) to get the modified behavior. It could potentially be generalized to other resampling kernels (cubic, etc.), and we would probably not want to duplicate all values of the resampling enumeration for that (and a dedicated value in the resampling enumeration wouldn't do anything by itself, unless you specify the excluded value tuple(s)). Similarly my above mentions about how to handle "regular" nodata could also potentially be controlled by a dedicated warping option ("NODATA_PCT_THRESHOLD" ?) if we wanted to offer that kind of control in the future.

@sgillies
Copy link
Contributor

sgillies commented Apr 6, 2024

Thanks for the explanation @rouault !

@rouault rouault changed the title Warper: add a SPECIAL_VALUES warping option, and use it in gdal2tiles Warper: add a EXCLUDED_VALUES warping option, and use it in gdal2tiles Apr 14, 2024
@rouault
Copy link
Member Author

rouault commented Apr 14, 2024

Following feedback, option names have been renamed:

  • SPECIAL_VALUES ==> EXCLUDED_VALUES
  • SPECIAL_VALUE_PCT_THRESHOLD ==> EXCLUDED_VALUES_PCT_THRESHOLD (put to plural for consistency)
  • --special-values ==> --excluded-values
  • --special-value-pct-threshold ==> --excluded-values-pct-threshold (put to plural for consistency)

…to be ignored as contributing source pixels during resampling

Limited to average resampling for now
@coveralls
Copy link
Collaborator

Coverage Status

coverage: 68.966% (+0.003%) from 68.963%
when pulling fe5fe9c on rouault:special_values
into 95579fe on OSGeo:master.

@rouault rouault merged commit 88b0dd8 into OSGeo:master Apr 17, 2024
32 checks passed
rouault added a commit to rouault/gdal that referenced this pull request May 3, 2024
This is similar to the EXCLUDED_VALUES_PCT_THRESHOLD option introduced
in OSGeo#9631, but here this controls the
minimum amount of transparent/invalid/nodata source pixels to cause the
target pixels to not be set.
Only used currently for average resampling.
rouault added a commit to rouault/gdal that referenced this pull request May 3, 2024
This is similar to the --excluded-values-pct-threshold option added in
OSGeo#9631, but here this controls the
minimum amount of transparent/invalid/nodata source pixels to cause the
target pixels to be transparent.
Only used currently for average resampling
rouault added a commit that referenced this pull request May 4, 2024
This is similar to the EXCLUDED_VALUES_PCT_THRESHOLD option introduced
in #9631, but here this controls the
minimum amount of transparent/invalid/nodata source pixels to cause the
target pixels to not be set.
Only used currently for average resampling.
rouault added a commit that referenced this pull request May 4, 2024
This is similar to the --excluded-values-pct-threshold option added in
#9631, but here this controls the
minimum amount of transparent/invalid/nodata source pixels to cause the
target pixels to be transparent.
Only used currently for average resampling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants