The best way to build is set up an AWS EC2 instance and build the docker containers from there. Building from your local machine is definitely possible, but you will probably hear heavy CPU fan noise (and possibly some smoke).
Prerequisites:
- Miniconda
- Docker
- bash, curl, wget, tar, git
conda create -n scing python=3.8 pip
conda activate scing
conda install -c cyclus java-jre
git clone https://github.com/hisplan/scing.git
pip install .
Set registry
in config.yaml
to your container registry.
The following will use, for example, hisplan
in Docker Hub as your container registry. Replace hisplan
with yours.
versoin: 1.0
containers:
registry: docker.io/hisplan
If you want to use Red Hat Quay.io:
versoin: 1.0
containers:
registry: quay.io/dpeerlab
where dpeerlab
should be replaced with your own Quay.io namespace.
If you want to use Amazon ECR (EC2 Container Registry):
versoin: 1.0
containers:
registry: 583643567512.dkr.ecr.us-east-1.amazonaws.com
where 583643567512.dkr.ecr.us-east-1.amazonaws.com
should be replaced with your own AWS ECR.
docker login
docker login quay.io
In addition to the login, you must set an OAuth access token so that the build script can create public repositories in Red Hat Quay.io:
export QUAY_AUTH_TOKEN="xyz-123-abc"
If you don't have one, you can create one by following this instruction.
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 583643567512.dkr.ecr.us-east-1.amazonaws.com
where 583643567512.dkr.ecr.us-east-1.amazonaws.com
is your registry address.
In case some of the GitHub repositories are in private mode, you must set up GitHub auth token to access those private repositories. If you want to use your GitHub auth token to access all the repositories specified in the config file, you can set up the environment variable as shown below:
export GIT_AUTH_TOKEN="abc-123-xyz"
If only some of the repositories are private or if you need to specify different GitHub auth tokens for different repositoires, then you can set up auth token on a per-image basis as shown below:
- name: cellranger-atac
version: 2.0.0
project_url: https://github.com/hisplan/docker-cellranger-atac
download_url: https://github.com/hisplan/docker-cellranger-atac/archive/refs/tags/v2.0.0.tar.gz
git_auth_token: abc-123-xyz
If everything is publicly available, you don't need to worry about GitHub auth token.
During the build process, 10x software (e.g. Cell Ranger) will be dockerized. To do this, you must first sign the 10x Genomics End User Software License Agreement (EULA). To automate the build process (e.g. CI/CD), make sure you sign the 10x Genomics EULA first:
scing download \
--site-url="https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/6.0/" \
--agree-eula
Run the build script:
scing build --config=config.yaml
When you create a job file (e.g. a job file for scATAC-seq), make sure you set the dockerRegistry
parameter to your own docker registry:
{
"CellRangerATAC.sampleName": "test_sample1",
"CellRangerATAC.fastqNames": "test_sample1",
"CellRangerATAC.fastqFiles": [
"s3://.../test_sample1_L001_I1_001.fastq.gz",
"s3://.../test_sample1_L001_R1_001.fastq.gz",
"s3://.../test_sample1_L001_R2_001.fastq.gz",
"s3://.../test_sample1_L001_R3_001.fastq.gz",
],
"CellRangerATAC.referenceGenome": {
"name": "GRCh38-2020-A-2.0.0",
"location": "https://cf.10xgenomics.com/supp/cell-atac/refdata-cellranger-arc-GRCh38-2020-A-2.0.0.tar.gz"
},
"CellRangerATAC.dockerRegistry": "${YOUR_DOCKER_REGISTRY}"
}