-
Hey Clay team! I've been following the rapidly emerging / growing set of use cases. (Kudos, by the way, these are very cool and very helpful examples!) I was curious to hear if anyone on the team has considered developing a simple regression or classification model using Clay's embeddings, but – critically for my use case – at the resolution of the input imagery rather than the resolution of the patches, which are 8x the scale of the inputs? For instance, if we're using Sentinel-2 imagery, we'd be targeting a 10m per-pixel prediction rather than something that is effectively 80m (patch res). The classification examples that exist on the site seem to use the class embeddings for the scene (e.g., Re: option 1, there are some prediction tasks where we won't have labels, as needed during segmentation. Moreover, a full-blown segmentation approach might be more complicated than we need if Clay's embeddings already capture the essence of what's going on in, and around, a pixel. Re: option 2, I've taken a stab at this logic (by computing the embeddings for a bunch of shifted windows and then reassembling the results), but wasn't sure if perhaps there's an easier way (or if this might be wrong-headed to begin with)? Here's what option 2 looks like, for reference:
I'll take any guidance or strategies you might have to offer! I noticed that @yellowcap said something perhaps related in the discussion on #231 with:
Though I may be taking that well out of context. Thanks again for all of your monumental efforts here! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
@lzachmann Clay model captures all the features from the image. As we see in this section of the tutorial, each embedding vector captures a unique property of the image. For example, embedding 97 is good at segmenting land & water, while embedding 207 might be good at detecting the shorelines. Some embeddings have visual meaning, while others might be too complex for human eyes but capture underlying representations. These embeddings are In the regression and segmentation examples, we extract intermediate feature maps from the model (like in the case of Unets), upsample, and fuse them for these tasks. To answer your query, in cases where you might not have labels for segmentation, you are basically looking to cluster similar features together (am I right in assuming this?). One way might be to experiment with different embedding feature maps and find one that fits your use case. For instance, if you want to detect solar panels and a particular embedding does that, you pick that embedding dimension, get your |
Beta Was this translation helpful? Give feedback.
-
Thank you @srmsoumya! (And sorry for the delayed response.) This is helpful. Re:
Not exactly. I was thinking more about a use case involving land cover data like GLanCE. Unlike the Chesapeake Bay dataset, which provides labels as imagery (every pixel in a given chip has a corresponding label), GLanCE gives us land cover at individual points (lat / lon), which rules out a segmentation-based approach. My original question concerned whether it was possible to fine-tune Clay using Sentinel-2 on a dataset like GLanCE, but make 10m resolution predictions. I suppose it's superficially similar to the classification example, but rather than classify an entire 224x224 chip as a given land cover type, I'm hoping to get per-pixel predictions (similar to what you see in the segmentation outputs). My thought was to use something simple like Random Forest, and make per-pixel predictions. However, merely up-sampling the 28x28 feature maps would produce something that is nominally 10m, but to your point would appear pixelated. The image I shared was meant to convey one potential answer to the issue described above. The image shows a single embedding dimension (chosen at random). To make the image, I loaded Sentinel-2 imagery (somewhat more than I need for a chip), and computed the embeddings for shifted subsets of the imagery. Basically start at lower left, compute embeddings, store the results, move right one pixel, clip, and recompute the embeddings, etc. We do this eight times left-to-right, and eight times bottom-to-top and stitch all of the results together to get 'smooth' 10m embeddings for an area of interest. I can share more in the way of code at some point if that would be helpful. But just wanted to say thanks for your advice to thus far! |
Beta Was this translation helpful? Give feedback.
-
Thank you for explaining the visuals! The smallest chip size that Clay can handle is |
Beta Was this translation helpful? Give feedback.
Thank you for explaining the visuals!
The smallest chip size that Clay can handle is
8 x 8
, so that is the native resolution at which you can get the embeddings. For your use case, you might consider looking into pixel-based models as an option. I recommend checking out PRESTO.