BioSamples

BioSamples https://www.ebi.ac.uk/biosamples/ an ELIXIR Core Deposition Database, stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or experimental samples, and have been used in an assay that has generated publicly available data in the European Nucleotide Archive (ENA) or ArrayExpress .

BioSamples supports links between sample records and any sample-derived datasets, including sequence-based datasets such as those held in the European Nucleotide Archive (ENA) or ArrayExpress, -omics-based datasets such as those in PRIDE or MetaboLights, and any other assay-based databases. It provides links to assays and specific samples and accepts direct submissions of sample information.

BioSamples also synchronizes data with the NCBI BioSample database and imports data from ENA.

This document provides information about the local installation, development environment setup instructions of BioSamples database.

Softwares

Git 2.17.1
Java 17
JDK 17
Maven 3.6.3
Docker 18.6

Setup

Run this in terminal to install the dependent softwares.

sudo apt-get update
sudo apt-get install temurin-17-jdk maven git docker (or any JDK 17)

Please make sure the software versions are correct.

docker -v
# Output
# Docker version 18.06.1-ce

java -version
# Output
# openjdk version "17.0.12" 2024-07-16

Install BioSamples on your computer.

This process sets up a local compiled version of all biosamples tools. It requires a large download of Spring dependencies and uses up to two threads per core of your machine. The installation might take several minutes.
```
git clone https://github.com/EBIBioSamples/biosamples-v4.git
cd biosamples-v4
./mvnw -T 2C package
```
Start Biosamples on your own machine
```
docker-compose up
```
If it returns ERROR: Couldn't connect to Docker daemon - you might need to run docker-machine start default. Please run sudo docker-compose up instead.
Now you can access the public interface at http://localhost:8081/biosamples/. So far, there is no data in the local sample.
Create an AAP or an ENA WEBIN account for API authentication and data upload

An AAP account or a WEBIN is required to upload data through API. The API account can be registered at https://explore.aai.ebi.ac.uk/registerUser. A detailed instruction about user account and authentication can be found on https://www.ebi.ac.uk/biosamples/docs/guides/authentication.

Above guide shows AAP account details for production usage. Please replace all URL references as below to create and authenticate against development server.

Original	Replace with
https://aai.ebi.ac.uk	https://explore.aai.ebi.ac.uk
https://api.aai.ebi.ac.uk	https://explore.api.aai.ebi.ac.uk

Alternatively an ENA WEBIN account can also be used for authentication and data upload. A production WEBIN account can be created here → https://www.ebi.ac.uk/ena/submit/webin/auth/swagger-ui/index.html?configUrl=/ena/submit/webin/auth/v3/api-docs/swagger-config#/AdministrationAPI/createSubmissionAccount (Note: replace www with wwwdev for a test WEBIN account)

Upload first test data (using AAP account)

TOKEN=$(curl -u Username https://explore.api.aai.ebi.ac.uk/auth)

curl 'http://localhost:8081/biosamples/samples' -i -X POST -H "Content-Type: application/json;charset=UTF-8" -H "Accept: application/hal+json" -H "Authorization: Bearer $TOKEN" -d '{
 "name" : "FakeSample",
 "update" : "2019-07-16T09:47:20.003Z",
 "release" : "2019-07-16T09:47:20.003Z",
 "domain" : "self.ExampleDomain"
}'

Data import

An example of the JSON format that can be sent by POST to http://localhost:8081/biosamples/beta/samples is at https://github.com/EBIBioSamples/biosamples-v4/blob/master/models/core/src/test/resources/TEST1.json

NCBI

Download the XML dump (~400Mb) to the current directory:

wget http://ftp.ncbi.nih.gov/biosample/biosample_set.xml.gz

Run the pipeline to send the data to the submission API via REST

docker-compose up biosamples-pipelines-ncbi

Note: You will need to mount the location that the XML dump was downloaded to within the docker container. A docker-compose.override.yml file is the easiest way to do that.

MongoDB notes

Cross-platform easy to use mongodb management tool http://www.mongoclient.com

Developing

Docker can be run from within a virtual machine e.g VirtualBox. This is useful if it causes any problems for your machine or if you have an OS that is not supported.

You might want to mount the virtual machines directory with the host, so you can work in a standard IDE outside of the VM. VirtualBox supports this.

If you ware using a virtual machine, you might also want to configure docker-compose to start by default.

As you make changes to the code, you can recompile it via Maven with:

./mvnw -T 2C package

And to get the new packages into the docker containers you will need to rebuild containers with:

docker-compose build

If needed, you can rebuild just a single container by specifying its name e.g.

docker-compose build biosamples-pipelines

To start a service, using docker compose will also start and dependent services it requires e.g.

docker-compose up biosamples-webapp-api

will also start solr, neo4j, mongo, and rabbitmq

To run an executable file in a docker container, and start its dependencies first use something like:

docker-compose run --service-ports biosamples-pipelines

If you want to add command line arguments note that these will entirely replace the executable in the docker-compose.yml file. So you need to do something like:

docker-compose run --service-ports biosamples-pipelines java -jar pipelines-4.0.0-SNAPSHOT.jar --debug

If you want to connect debugging tools to the java applications running inside docker containers, see instructions at http://www.jamasoftware.com/blog/monitoring-java-applications/

Note that you can bring maven and docker together into a single commandline like:

./mvnw -T 2C package && docker-compose build && docker-compose run --service-ports biosamples-pipelines

Beware, Docker tar’s and copies all the files on the filesystem from the location of docker-compose down. If you have data files there (e.g. downloads from ncbi, docker volumes, logs) then that process can take so long that it makes using Docker impractical.

As docker-compose creates new volumes each time, you may fill the disk docker is working on. To delete all docker volumes use:

docker volume ls -q | xargs -r docker volume rm

To delete all docker images use:

docker images -q | xargs -r docker rmi

Note	this will remove everything not just things for this project

Client usage

There is a spring client, and a spring-boot starter module, for use with BioSamples. To use these in a maven project, add the following to the appropriate sections:

<dependencies>
    <dependency>
        <groupId>uk.ac.ebi.biosamples</groupId>
        <artifactId>biosamples-spring-boot-starter</artifactId>
        <version>5.3.7</version>
    </dependency>
</dependencies>

** 5.3.7 is an example, latest version is available in the release notes here -> https://www.ebi.ac.uk/biosamples/docs/releasenotes

maven {
  url 'https://gitlab.ebi.ac.uk/api/v4/projects/2669/packages/maven'
}

This can then be configured by several spring application.properties including biosamples.client.uri to specify the base URI of the BioSamples instance to use.

Issues and troubleshooting

Problems with spring-data-rest

This was originally using spring-data-rest to expose rest API for the repositories. But there are a number of problems with this (see below) and that was scrapped in favor of implementing custom HATEOAS compliant endpoints.

Content type negotiation is not possible as it can’t overlap with the URLs for the Thymeleaf controllers and it can’t serve XML even with the appropriate converters supplied.

When repeatedly sending JSON because it is a list of things with optional components, the optional parts can become mixed if the list ordering changes. Maybe this can be remedied by using map of attribute types instead?

Known issues

Solr has a limit on the field size (technically the term vector). Therefore the attribute values over 255 characters are not indexed in solr.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 4,481 Commits
.mvn/wrapper		.mvn/wrapper
agents		agents
client		client
commons		commons
integration		integration
messaging		messaging
models		models
pipelines		pipelines
properties		properties
scripts		scripts
utils		utils
webapps		webapps
.adr-dir		.adr-dir
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
Dockerfile_Building		Dockerfile_Building
LICENSE		LICENSE
README.adoc		README.adoc
RELEASE.adoc		RELEASE.adoc
ci_settings.xml		ci_settings.xml
citation.cff		citation.cff
docker-agents.sh		docker-agents.sh
docker-compose.yml		docker-compose.yml
docker-debug.sh		docker-debug.sh
docker-integration.big.sh		docker-integration.big.sh
docker-integration.sh		docker-integration.sh
docker-logging.sh		docker-logging.sh
docker-test.sh		docker-test.sh
docker-webapp.sh		docker-webapp.sh
http-status-check		http-status-check
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml
update-versions.sh		update-versions.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioSamples

Table of contents

Softwares

Setup

Data import

NCBI

MongoDB notes

Developing

Client usage

Issues and troubleshooting

Problems with spring-data-rest

Known issues

License

About

Releases 45

Packages

Contributors 15

Languages

License

EBIBioSamples/biosamples-v4

Folders and files

Latest commit

History

Repository files navigation

BioSamples

Table of contents

Softwares

Setup

Data import

NCBI

MongoDB notes

Developing

Client usage

Issues and troubleshooting

Problems with spring-data-rest

Known issues

License

About

Resources

License

Stars

Watchers

Forks

Releases 45

Packages 0

Contributors 15

Languages

Packages