Pseudo Benchmarking Clay #269

srmsoumya · 2024-06-06T19:27:44Z

srmsoumya
Jun 6, 2024
Maintainer

Experiment Overview

We have conducted experiments with the Clay model on various downstream tasks, specifically focusing on classification and segmentation. Note that in all cases, the Clay encoder remains frozen, and only the additional layers are trained.

Initial Observations

The Clay model shows strong learning capabilities, with most tasks being learned effectively within the first epoch, after which performance plateaus.

Classification Task

For the classification task, we added a fully connected (FC) block on top of the Clay encoder. After 5 minutes of training, the model achieved a training accuracy of 0.985 and a validation accuracy of 0.98. The loss curves indicate that most of the learning occurs within the first epoch.

Training statistics

Validation statistics

Dataset Used: EuroSAT

Dataset Citation:

Eurosat: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
- Patrick Helber, Benjamin Bischke, Andreas Dengel, Damian Borth.
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019.
- Benchmark on Eurosat

Segmentation Task

For the segmentation task, we tested the model on the Chesapeake Bay CVPR dataset. We attached a decoder similar to Segformer, which extracts features from intermediate layers and fuses them to predict segmentation masks.

We used a subset of the dataset consisting of 2000 random samples for training and validation. After 10 minutes of training (10 epochs), the model showed a similar learning pattern, with most learning occurring in the first epoch. The validation scores were a weighted IOU of 0.875 and an F1 score of 0.93.

Training statistics

Validation statistics

Prediction on sample masks: Image / Actual Mask / Prediction Mask

Dataset Used: Chesapeake Bay CVPR

Dataset Citation:

Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data
- Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N.
- Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019).
- Dataset URL

Decoder Reference:

Segformer: Simple and Efficient Design for Semantic Segmentation with Transformers
- Paper URL

Next Steps

For segmentation tasks, add more decoders from models like UperNet and PSPNet to compare Clay's performance with other models.
Include examples for object detection.
Add examples for change detection.
Expand the number of examples for segmentation and classification tasks.

adamjstewart · 2024-06-07T15:48:05Z

adamjstewart
Jun 7, 2024

These benchmarks are great, but how do these benchmarks compare to other state-of-the-art models? For example, your ViT has a 98% validation accuracy on EuroSAT, but Google reported a 99.2% accuracy on EuroSAT using a simple ResNet-50 model 5 years ago. I would recommend against using saturated benchmarks like EuroSAT for which even simple ImageNet weights can easily achieve 98%+ accuracy. There are far better, larger, and newer benchmark suites out there (see Table 2a of this paper).

2 replies

brunosan Jun 7, 2024
Maintainer

Thanks for the comment Adam. Step by step :) Can you help us?
It is very much on our plans to do as you point out. If 5 years ago we could do 99% accuracy, what other barriers exist in your opinion so that more people working on the ground can benefit from it? Is it documentation? Is it licensing? Is it no-code wrappers?
We do want to make a really good technical model, and we believe Clay 1.0 already is, and we are benchmarking to understand exactly where and how, and we are also very biased towards applications to validate the model based on that.
A group of us also started a Benchmark group to try to assess both of these types metrics (what I call the "objective" ones and the "subjective" like end-user usability).

Kudos on the linked paper! Our plan is to leverage it on the WG and in how we move forward with Clay.

adamjstewart Jun 7, 2024

Can you help us? A group of us also started a Benchmark group to try to assess both of these types metrics (what I call the "objective" ones and the "subjective" like end-user usability).

Thanks for the link, I just asked to join!

If 5 years ago we could do 99% accuracy, what other barriers exist in your opinion so that more people working on the ground can benefit from it? Is it documentation? Is it licensing? Is it no-code wrappers?

Image classification (especially EuroSAT) is a relatively easy task. I think there's still a lot of room for improvement on more challenging datasets/tasks. EO is far from a solved problem.

I agree with your LinkedIn post that the biggest barrier is no-code or low-code wrappers (a la ChatGPT). The biggest remaining research questions are dynamic spectral (DOFA, Clay), spatial (Scale-MAE), and time series support. A few models have tackled one of these 3, but no one has really tried to tackle all 3 at the same time.

rbavery · 2024-07-24T21:08:04Z

rbavery
Jul 24, 2024

@srmsoumya is it documented what resources were used for fine-tuning, similar to the Training Card for training the Clay model from scratch? My assumption is that this was done single node with one p5.48x instance but not sure if the memory footprint of the fine-tuning is smaller because the encoder is frozen and if smaller instances can be used.

https://clay-foundation.github.io/model/release-notes/specification.html#training-card

0 replies

srmsoumya · 2024-07-25T06:21:02Z

srmsoumya
Jul 25, 2024
Maintainer Author

@rbavery For fine-tuning, we don't need larger VM instances like p4.48x. I ran most of the experiments on a g5.xlarge instance, which has a single A10 GPU with 4 vCPUs. All the experiments took under 20 minutes.

We can also use smaller instances by adjusting the batch size accordingly.

1 reply

rbavery Jul 25, 2024

Nice, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pseudo Benchmarking Clay #269

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pseudo Benchmarking Clay #269

srmsoumya Jun 6, 2024 Maintainer

Experiment Overview

Initial Observations

Classification Task

Segmentation Task

Next Steps

Replies: 3 comments · 3 replies

adamjstewart Jun 7, 2024

brunosan Jun 7, 2024 Maintainer

adamjstewart Jun 7, 2024

rbavery Jul 24, 2024

srmsoumya Jul 25, 2024 Maintainer Author

rbavery Jul 25, 2024

srmsoumya
Jun 6, 2024
Maintainer

Replies: 3 comments 3 replies

adamjstewart
Jun 7, 2024

brunosan Jun 7, 2024
Maintainer

rbavery
Jul 24, 2024

srmsoumya
Jul 25, 2024
Maintainer Author