-
I have a question about the location and timestep information added to the input sequence and how it might be related to using the encoder outputs for calculating similarity between chips. For my use-case, I'm interested in comparing the similarity between chips based only on the pixel values in the data cube (spectral, SAR and DEM). My understanding is that by including the I'm new to the transformer architecture so I did an experiment to make sure I was thinking about this right. Some of the tutorials suggest using parts of the embedding to either focus on or exclude the location and timestep (for example: https://clay-foundation.github.io/model/tutorial_digital_earth_pacific_patch_level.html). What I wanted to check was whether excluding the last 2 vectors in the embedding would eliminate location and timestep information.
Even when excluding the last two vectors in the embedding, the embeddings look different. Although not very different. Maybe this means that the first 1536 vectors are mostly capturing information in the pixels and it's okay to use for calculating similarity? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
That is correct. Clay is designed to be aware of time and location. The idea is that "AI for Earth" should be aware of Earth not images unthetered like other image models. (the intuition is that temporal and spatial proximity are relevant e.g. Madrid doesn't move day to day). If you do have a use case that needs the model where that is not the case, my hunch is that the easiest thing to do is scrap those inputs. All inputs will be optional in V1 due in May. The above is not the case for v0.1 that you used here. v0.1 and v0.2 that we will release in a week or os do need complete inputs. The "good news" is that we estimate that the model is not really paying much attention to location or time currently. The reason being that the input coverage is just too coarse. The % of space and time we cover is extremely small, so the model doesn't have much opportunity to leverage these dependencies. We are still learning to understand the model. If the above is all correct, my take for your need is that the embeddings do change in all dimensions VERY slightly, especially if the locations set randomly happen to be close to those in the training set, which might bias the model towards what it "expects" to see. I do not understand how the last 2 dimensions would contain the location and time (It might be the case, but I personally do not understand how) Makes sense? |
Beta Was this translation helpful? Give feedback.
That is correct. Clay is designed to be aware of time and location. The idea is that "AI for Earth" should be aware of Earth not images unthetered like other image models. (the intuition is that temporal and spatial proximity are relevant e.g. Madrid doesn't move day to day).
If you do have a use case that needs the model where that is not the case, my hunch is that the easiest thing to do is scrap those inputs. All inputs will be optional in V1 due in May.
The above is not the case for v0.1 that you used here. v0.1 and v0.2 that we will release in a week or os do need complete inputs.
The "good news" is that we estimate that the model is not really paying much attention to location or ti…