Sliding windows versus tiling #308

brunosan · 2024-07-26T12:03:15Z

brunosan
Jul 26, 2024
Maintainer

We split a raster into images, and then the transformer uses patch sizes (of 8x8) in the transformer. This means that the self-attention for each patch uses the rest of the image to learn the context. This ideal for the center patch, but patches in the corners of the image will only get the context the rest of the image. Since they are in the corner, this leaves only a fraction of the actual context.

The green patch at the center has good context for self-atttention, but the yellow patch on the corner will only have very limited context in some directions. It will not even see the adjacent patches, since they are on other images.

The easiest solution here is to create embeddings not by tiling a raster, but by sliding a window of the same size of the tile, and assigning the location of embedding to the patch at the center. This solution greatly multiplies the number of embeddings, and also creates a lot of overlaps of embeddings.

We do not know how to create geoembeddings that both are context aware (transformer) yet small enough, and assign them semantics bounds.

cc @srmsoumya @yellowcap @danhammer

yellowcap · 2024-08-12T13:35:39Z

yellowcap
Aug 12, 2024
Maintainer

Interesting thinking!

In previous projects doing 2D segmentation or regression using CNN, I used the following:

Predict on sliding windows with tiles having a 20% overlap
Remove the edge pixels from the prediction by cropping the model output inwards, for example the first 10 pixels on the edge are dropped
Average the class probablities of the remaining pixels that overlap.
This removes the edge effect of not having better context, and stitches together the "inwards buffered output" into a continuous map.

This would also work for transformers, as it operates on the final output, not on embeddings.

A single "class embedding" already does the summarization of the entire tile, so here I don't see an urgent need to worry about the edges. For finetune applicaions it would be interesting to see how a model would perform if one removes the patches at the edge!

However, intuitively I would do the inference using all the patches and then handle edge effects post prediction with an algorithm like the one described above.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sliding windows versus tiling #308

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Sliding windows versus tiling #308

brunosan Jul 26, 2024 Maintainer

Replies: 1 comment

yellowcap Aug 12, 2024 Maintainer

brunosan
Jul 26, 2024
Maintainer

yellowcap
Aug 12, 2024
Maintainer