Skip to content

Commit

Permalink
update posts
Browse files Browse the repository at this point in the history
  • Loading branch information
LBerth committed Oct 22, 2024
1 parent 56dc138 commit a2be136
Show file tree
Hide file tree
Showing 2 changed files with 88 additions and 53 deletions.
32 changes: 19 additions & 13 deletions _posts/2024-10-21-py4cast.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,35 @@ author:
---


# Py4cast: A Framework to Train & Compare AI Weather Forecasting Models
In this first blog post, we present Py4cast, a comprehensive framework designed to facilitate the training and comparison of AI-based weather forecasting models. Py4cast aims to streamline the experimentation process, allowing researchers to efficiently test various neural network architectures and methodologies. We demonstrate the utility of Py4cast through a series of experiments using the TITAN dataset, showcasing its capability to produce accurate weather forecasts and its flexibility in accommodating different model types.

## Introduction
This project, built using PyTorch and PyTorch-lightning, is a Work in Progress, intended to share ideas and design concepts with partners.

In this paper, we present Py4cast, a comprehensive framework designed to facilitate the training and comparison of AI-based weather forecasting models. Py4cast aims to streamline the experimentation process, allowing researchers to efficiently test various neural network architectures and methodologies. We demonstrate the utility of Py4cast through a series of experiments using the TITAN dataset, showcasing its capability to produce accurate weather forecasts and its flexibility in accommodating different model types.
It is currently developped at Météo-France by DSM/AI Lab and CNRM/GMAP/PREV, and at Eviden.

## Discussion

The initial experiments underscore the potential of AI in weather forecasting, highlighting the importance of exploring diverse neural network architectures. Py4cast's flexibility allows for rapid experimentation and iteration, making it a valuable tool for researchers in the field.
## Features


* 7 neural network architectures : Half-Unet, U-Net, SegFormer, SwinUnetR, HiLam, GraphLam, UnetR++
* 1 dataset with samples available on Huggingface : Titan
* 2 training strategies : Scaled Auto-regressive steps, Differential Auto-regressive steps
* 4 losses: Scaled RMSE, Scaled L1, Weighted MSE, Weighted L1
* neural networks as simple torch.nn.Module
* training with pytorchlightning
* simple interfaces to easily add a new dataset, neural network, training strategy or loss
* simple command line to lauch a training
* config files to change the parameters of your dataset or neural network during training
* experiment tracking with tensorboard and plots of forecasts with matplotlib
* implementation of NamedTensors to tracks features and dimensions of tensors at each step of the training


## Future Work

Future developments will focus on expanding the TITAN dataset, testing additional architectures such as GraphCast and Pangu, and optimizing training strategies to enhance resource efficiency. We also plan to incorporate high-resolution wave models and conduct further studies on the impact of time step duration on forecast quality.

## Conclusion

Py4cast represents a significant step forward in the application of AI to weather forecasting. By providing a versatile and efficient framework, it enables researchers to push the boundaries of what is possible in this critical field.

## Acknowledgments

Lorem Ipsum

## References

- [1] Author et al., "Neural-LAM: A Neural Network Approach to Weather Forecasting," Journal of Atmospheric Sciences, 2023.
- [2] Author et al., "Vision Transformers for Weather Forecasting," Proceedings of the AI Weather Conference, 2024.
This project started as a fork of neural-lam, a project by Joel Oskarsson, see [here](https://github.com/mllam/neural-lam). Many thanks to Joel for his work!
109 changes: 69 additions & 40 deletions _posts/2024-10-21-titan.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,30 +8,89 @@ author:
---


# TITAN : Training Inputs & Targets from Arome for Neural networks
With the rise of deep learning models for weather forecasting, there’s a growing need for high-quality datasets that can help train neural networks to make reliable predictions. TITAN, a dataset specifically designed for this purpose, aims to bridge this gap by offering extensive meteorological data focused on France’s metropolitan area.

Titan is a dataset made to train an AI weather forecasting models on France.
## What is TITAN?

TITAN is a comprehensive weather dataset created to train deep neural networks for forecasting meteorological conditions. This dataset focuses on surface-level weather parameters as well as data from four different atmospheric layers, providing a rich, multi-dimensional perspective on France's weather patterns.

## Data
## Key Features of TITAN

* 2 data sources: Analyses and forecasts from AROME and ARPEGE NWP models
* 1 hour timestep
* Depth: 5 years
* Format NPY
1. **Geographical Focus**: The dataset is centered on France's metropolitan area, covering a wide range of geographical and climatic conditions.

INSERT MAP OF DOMAIN

2. **Multiple Weather Parameters**: TITAN includes 5 key meteorological variables, both at the surface and at multiple levels in the atmosphere. These parameters are critical for understanding and predicting weather patterns. They include: Temperature, Geopotential, Humidity, Wind and Precipitation rates.

3. **Four Atmospheric Levels**: In addition to surface-level data, TITAN provides meteorological measurements at four different heights in the atmosphere, offering deeper insights into vertical weather dynamics. These levels are essential for predicting complex phenomena like thunderstorms, heavy rainfall, and wind shear. These levels are 250, 500, 700 and 850 hPa.


4. **Source and Timeframe**: The data in TITAN is derived from the Arome and Arpège models, two highly respected systems for meteorological analysis and forecasting. The dataset includes both analysis and forecast data, available at a 1-day timestep for a period of 5 years. This extensive temporal range enables researchers to study long-term trends and train models to make predictions over a significant historical window.

5. **Primary Focus on Arome Analysis**: For each hourly timestep, we chose to primarily provide Arome analysis data, which represents the best estimate of the current atmospheric state, essentially acting as the ground truth. Arome analysis gives the most accurate reflection of conditions at that time. In addition, we offer Arpège data as a coupling model, which can be used to test the incorporation of boundary conditions. For Arpège, a run is available every 6 hours, and every sixth timestep is an analysis, with the remaining timesteps representing forecast data. It is important to note that Arpège forecast data and analysis do not overlap; for any given timestep, there is only one valid Arpège data point, either analysis or forecast.

6. **Compact by Design**: To ensure that the dataset is easily usable and accessible, we deliberately kept it small, under 1 terabyte of data. By focusing on a limited number of atmospheric levels and parameters, TITAN provides a balance between detail and efficiency, making it well-suited for quick integration and experimentation without overwhelming storage or computational resources.


7. **Designed for efficient training**: The dataset is formatted to be directly usable by deep learning frameworks. For each hourly timestep, we provide one NPY file per parameter and atmospheric level, making it quick to load on a GPU and easy to select a subset of parameters for specific experiments.


## Conclusion

TITAN is an invaluable resource for meteorologists, data scientists, and researchers working in the field of weather forecasting. By providing rich, multi-level atmospheric data, it opens up new opportunities for applying deep learning techniques to predict France’s weather with greater precision.

As weather patterns become increasingly unpredictable due to climate change, datasets like TITAN will play a crucial role in improving the resilience and accuracy of forecasting systems. Stay tuned for more insights on how TITAN can transform the future of meteorological research.


## Details on available data and parameters

* **Data Sources**: Analyses and forecasts from AROME and ARPEGE NWP models
* **Format**: NPY
- **Resolution**: 2.5km
- **Historical Data**: 2021-2023 (training: 2021-2022, testing: 2023)
- **Historical Data**: 5 years, 2020-2024
- **Time Step**: 1 Hour
- **21 Weather Parameters**: Input and output of the model
- **XX Weather Parameters**:
- **5 Surface Variables**: Temperature, humidity, wind (u & v), and precipitation
- **4 Variables at 4 Vertical Levels**:
- 850, 700, 500, 250 hPa
- T, U, V, Z
**Note**: Precipitation is the only parameter that is not an analysis. It is an AROME forecast made every hour, predicting the cumulative precipitation in mm for the next hour. In the future, we aim to use higher quality expertized radar data.

**Note**: Precipitation is the only Arome parameter that is not an analysis. It is a AROME forecast made every hour, predicting the cumulative precipitation in mm for the next hour. In the future, we aim to use higher quality expertized radar data.


### Table of parameters

Data is grouped in folders per hour. Each hourly folder contains XX npy files, one per weather parameter and level.

In the source column, we note (A) for model analysis data or (F) for forecast data.


| File name | Name | Unit | Source NWP Model | Levels |
| :---: | :---: | :---: | :---: | :---: |
│ aro_t2m_2m | 2 meter temperature │ K │ Arome (A) | 2m |
│ aro_r2_2m | 2 meter temperature │ % │ Arome (A) | 2m |
│ aro_tp_0m | 2 meter temperature │ kg m**-2 │ Arome (F) | 0m |
│ aro_u10_10m | 2 meter temperature │ m s**-1 │ Arome (A) | 10m |
│ aro_v10_10m | 2 meter temperature │ m s**-1 │ Arome (A) | 10m |
│ aro_t_XhPa | 2 meter temperature │ K │ Arome (A) | 250, 500, 700, 850 hPa |
│ aro_u_XhPa | 2 meter temperature │ m s**-1 │ Arome (A) | 250, 500, 700, 850 hPa |
│ aro_v_XhPa | 2 meter temperature │ m s**-1 │ Arome (A) | 250, 500, 700, 850 hPa |
│ aro_z_XhPa | 2 meter temperature │ m**2 s**-2 │ Arome (A) | 250, 500, 700, 850 hPa |
│ aro_r_XhPa | 2 meter temperature │ m**2 s**-2 │ Arome (A) | 250, 500, 700, 850 hPa |
│ arp_t_XhPa | 2 meter temperature │ K │ Arpege (A or F) | 250, 500, 700, 850 hPa |
│ arp_u_XhPa | 2 meter temperature │ m s**-1 │ Arpege (A or F) | 250, 500, 700, 850 hPa |
│ arp_v_XhPa | 2 meter temperature │ m s**-1 │ Arpege (A or F) | 250, 500, 700, 850 hPa |
│ arp_z_XhPa | 2 meter temperature │ m**2 s**-2 │ Arpege (A or F) | 250, 500, 700, 850 hPa |
│ arp_r_XhPa | 2 meter temperature │ m**2 s**-2 │ Arpege (A or F) | 250, 500, 700, 850 hPa |



## Download

* Size of **compressed** 5 years archive : ~ XX Go

* Size of **uncompressed** 5 years archive : ~ XX Go

For now, 3 days of data stored on [HuggingFace](https://huggingface.co/datasets/meteofrance/titan)

To download :
Expand All @@ -44,36 +103,6 @@ git clone https://huggingface.co/datasets/meteofrance/titan
```


## Details on available parameters

Data is grouped in folders per hour. Each folder contains XX npy files, one per weather parameter and level.


| File name | Name | Unit | Source NWP Model | Levels |
| :---: | :---: | :---: | :---: | :---: |
│ aro_t2m_2m | 2 meter temperature │ K │ Arome | 2m |
│ aro_r2_2m | 2 meter temperature │ % │ Arome | 2m |
│ aro_tp_0m | 2 meter temperature │ kg m**-2 │ Arome | 0m |
│ aro_u10_10m | 2 meter temperature │ m s**-1 │ Arome | 10m |
│ aro_v10_10m | 2 meter temperature │ m s**-1 │ Arome | 10m |
│ aro_t_XhPa | 2 meter temperature │ K │ Arome | 250, 500, 700, 850 hPa |
│ aro_u_XhPa | 2 meter temperature │ m s**-1 │ Arome | 250, 500, 700, 850 hPa |
│ aro_v_XhPa | 2 meter temperature │ m s**-1 │ Arome | 250, 500, 700, 850 hPa |
│ aro_z_XhPa | 2 meter temperature │ m**2 s**-2 │ Arome | 250, 500, 700, 850 hPa |
│ aro_r_XhPa | 2 meter temperature │ m**2 s**-2 │ Arome | 250, 500, 700, 850 hPa |
│ arp_t_XhPa | 2 meter temperature │ K │ Arpege | 250, 500, 700, 850 hPa |
│ arp_u_XhPa | 2 meter temperature │ m s**-1 │ Arpege | 250, 500, 700, 850 hPa |
│ arp_v_XhPa | 2 meter temperature │ m s**-1 │ Arpege | 250, 500, 700, 850 hPa |
│ arp_z_XhPa | 2 meter temperature │ m**2 s**-2 │ Arpege | 250, 500, 700, 850 hPa |
│ arp_r_XhPa | 2 meter temperature │ m**2 s**-2 │ Arpege | 250, 500, 700, 850 hPa |


## Storage

* Size of **compressed** 5 years archive : ~ XX Go

* Size of **uncompressed** 5 years archive : ~ XX Go

## TODO

* add aro humidity on all levels
Expand Down

0 comments on commit a2be136

Please sign in to comment.