🌊 Machine learning model for predicting ocean bathymetry
The versions listed below are what was used in our paper. Newer or older versions may also work. If you encounter any issues with newer versions, please open an issue.
- Python 3.11.7
- cartopy 0.22.0
- cmocean 3.0.3
- geocube 0.4.2
- geopandas 0.14.1
- matplotlib 3.8.2
- netcdf4 1.6.5
- numpy 1.26.2
- pandas 2.1.4
- scikit-learn 1.3.2
- scipy 1.11.4
- shapely 2.0.2
- xarray 2023.12.0
- macOS 14.1.2
- Ubuntu 22.04.3
Should run on any CPU or RAM size, including on a laptop
First, clone this project:
> git clone https://github.com/adamjstewart/bathymetry.git
> cd bathymetry
Then, install the Python dependencies:
> pip install -r requirements.txt
This should only take a few seconds to install.
All data should be stored in the same root directory. The default is data
, but a different directory can be specified with --data-dir
.
This model is trained on the CRUST1.0 dataset. In order to reproduce this work, you will need to download both the basic model and the add-on that includes the crustal type file. Then, extract the tarballs in a crust1.0
directory within the data directory.
Seafloor age data can be found at EarthByte. For this model, we downsample all seafloor age data to 1-degree resolution. We test with several different seafloor age datasets:
Each of these files should be placed in their respective directories within the data directory.
The plate boundaries shapefiles can be downloaded from the World tectonic plates and boundaries. Download and extract a zip file of the entire repository within the data directory.
To train a ridge regression model, run the following command:
> python3 train.py ridge
Reading datasets...
Reading data/age2020/age.2020.1.GTS2012.6m.nc...
Reading data/crust1.0/crust1.bnds...
Reading data/crust1.0/crust1.vp...
Reading data/crust1.0/crust1.vs...
Reading data/crust1.0/crust1.rho...
Reading data/crust1.0/CNtype1-1.txt...
Reading data/tectonicplates-master/PB2002_plates.shp...
Preprocessing...
Cross-validation...
Group 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Evaluating...
RMSE: 0.591818050597389
R^2: 0.7508725821083216
Saving predictions...
Writing checkpoints/checkpoint-ridge-100-True-False-None-False-1-auto-0.0001.pickle...
Writing checkpoints/truth.nc...
Writing checkpoints/ridge.nc...
This should only take a few seconds to run. Replace "ridge" with other models to compare performance metrics. Note that MLP will take much longer (around an hour on a laptop).
To reproduce all experimental results from our paper, see the scripts in the jobs
directory. Specifically:
ridge*.sh
,svr*.sh
,mlp*.sh
: find optimal hyperparameters for all modelstrain.sh
: reproduce results with optimal hyperparametersablation.sh
: feature and layer ablation studyplot.sh
: generate some basic maps of the results
These jobs were submitted using the Slurm Workload Manager on TACC and ICCP. The scripts should work on any system, but may be slow unless you use a cluster. If you use a different cluster, you may need to change the job configuration parameters.