Tutorial: Upload a dataset to VALERIA

The University provided access to cutting edge data storage and transfer solutions through the VALERIA service. Most notably, professors have access to 6 Tb of Amazon S3 storage through VALERIA. This storage is ideal for managing datasets. In this tutorial, we will cover how to get access to this storage, upload data and eventually share datasets with the community.

This guide was written by Dominic Baril, 2022

Update, Maxime Vaidis September 2022

Getting access to the storage

You will first need to create a VALERIA account, and ask for your supervisor to give you access to the platform.
Once your account is created, you need to ask for permission to access to the storage. For example, François has a /norlab repository.
You then need to configure your access to the storage. If you are on Ubuntu, the easiest client to configure is rclone.
Install rclone using sudo apt install rclone (to ensure the version is stable).
Follow the steps shown on this page starting at step 2.1 to configure it on your local computer. You will need to enter your access keys, which can be found on your VALERIA dashboard. Once your rclone is configured to have access to VALERIA's S3 storage, you should see the following return when entering the rclone config command:

Current remotes:

Name                 Type
====                 ====
VALERIAS3            s3

Uploading / Downloading datasets

You can then use rclone commands to interact with the S3 storage. Refer to the rclone documentation for details on all possible commands. Here are some simple useful ones:

rclone ls VALERIAS3:path: List all directories / files in the "path" folder of your VALERIAS3 remote storage. Here, "path" is the name of the repository on the VALERIAS3 storage (example: "VALERIAS3:/norlab"). Note that if a folder is shared with you, the auto-completion might not work. In this case, just manually type the path you want to access.
rclone copy sourcepath VALERIAS3:destpath --progress: Copy the local directory / file "sourcepath" into "destpath" on the S3 storage.
rclone move VALERIAS3:sourcepath VALERIAS3:destpath: Move contents of a directory to another on the S3 storage.

Creating a link for anyone to download your dataset

Once your files are uploaded in the S3 storage, you might want to create a link for quick remote access to share your dataset with the community. To do so, you will need to use the s3cmd client instead. You can access it through a terminal via the JupyterHub server on VALERIA.
Once on JupyterHub, you will first need to configure the S3 command-line tool. Open a terminal and create a .s3cfg file:

touch .s3cfg

Then, using your favorite command-line text editor, paste the following lines in the .s3cfg file and adjust the values of the access_key and secret_key parameters:

[default]
# VOTRE IDUL
access_key = YOUR_ACCESS_KEY
secret_key = YOUR_SECRET_KEY

access_token =
add_encoding_exts =
add_headers =
bucket_location = US
ca_certs_file =
cache_file =
check_ssl_certificate = True
check_ssl_hostname = True
cloudfront_host = cloudfront.amazonaws.com
content_disposition =
content_type =
default_mime_type = binary/octet-stream
delay_updates = False
delete_after = False
delete_after_fetch = False
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
expiry_date =
expiry_days =
expiry_prefix =
follow_symlinks = False
force = False
get_continue = False
gpg_command = /usr/bin/gpg
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase =
guess_mime_type = True
host_base = s3.valeria.science
host_bucket = %(bucket)s.s3.valeria.science
human_readable_sizes = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
invalidate_on_cf = False
kms_key =
limit = -1
limitrate = 0
list_md5 = False
log_target_prefix =
long_listing = False
max_delete = -1
mime_type =
multipart_chunk_size_mb = 15
multipart_max_chunks = 10000
preserve_attrs = True
progress_meter = True
proxy_host =
proxy_port = 0

Finally, generate the URL to the dataset:

s3cmd signurl s3://norlab/fr2021_dataset/winter_dataset.zip/winter.zip $(echo "`date +%s` + 3600 * 24 * 7 * 1000" | bc)

This command returns a URL to download the dataset, which is valid for 1000 weeks. It is up to you to define the expiry date for your URL, but beware that an expired URL will prevent other people in the scientific community from accessing your dataset.

You can then create a page to share the link that allows you to access your dataset. We suggest your to use this Wiki to share your dataset. Here is the example for the Kilometer-scale autonomous navigation in subarctic forests: challenges and lessons learned dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial: Upload a dataset to VALERIA

Getting access to the storage

Uploading / Downloading datasets

Creating a link for anyone to download your dataset

Home

New Students

Norlab's Robots

Protocols

Templates

Resources

Grants

Datasets

Mapping

Deep Learning

ROS

Ubuntu

Docker (work in progress)

Tips & tricks

Norlab's Recipes

Clone this wiki locally