-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deploying to gh-pages from @ 78ce346 🚀
- Loading branch information
0 parents
commit 7e49d0f
Showing
167 changed files
with
23,485 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: f2af386d6e99bb22829f120be7f0f3d7 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+2.55 KB
.doctrees/_autosummary/precellar.utils.strip_barcode_from_fastq.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,275 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Preprocess 10X Genomics Gene Expression + ATAC Data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import precellar" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We first create a SeqSpec object from the 10X multiome seqspec template file." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:31Z \u001b[32mINFO \u001b[0m cached_path::cache\u001b[90m]\u001b[0m Cached version of https://raw.githubusercontent.com/regulatory-genomics/precellar/refs/heads/main/seqspec_templates/10x_rna_atac.yaml is up-to-date\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"assay = precellar.SeqSpec(\"https://raw.githubusercontent.com/regulatory-genomics/precellar/refs/heads/main/seqspec_templates/10x_rna_atac.yaml\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The SeqSpec object can be visualized as a tree structure." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"\n", | ||
"├── rna(153-1150)\n", | ||
"│ ├── rna-illumina_p5(29)\n", | ||
"│ ├── rna-truseq_read1(29)\n", | ||
"│ ├── rna-cell_barcode(16)\n", | ||
"│ ├── rna-umi(12)\n", | ||
"│ ├── rna-cDNA(1-1000)\n", | ||
"│ ├── rna-truseq_read2(34)\n", | ||
"│ ├── rna-index7(8)\n", | ||
"│ └── rna-illumina_p7(24)\n", | ||
"└── atac(153-1150)\n", | ||
" ├── atac-illumina_p5(29)\n", | ||
" ├── atac-cell_barcode(16)\n", | ||
" ├── atac-linker(8)\n", | ||
" ├── atac-nextera_read1(33)\n", | ||
" ├── atac-gDNA(1-1000)\n", | ||
" ├── atac-nextera_read2(34)\n", | ||
" ├── atac-index7(8)\n", | ||
" └── atac-illumina_p7(24)" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"assay" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We then add ATAC data to the SeqSpec object. Specifically, we add R1, R2 and index fastq files. Note that the index read, ususally named \"*_R2_*\" in the 10X multiome data, can be sequenced in either direction, depending on the sequencing platform. Older platforms such as NovaSeq 6000 with v1.0 reagent kits, MiniSeq with Rapid Reagent kits, MiSeq, HiSeq 2500, and HiSeq 2000 use the forward strand workflow. iSeq 100, MiniSeq with Standard reagent kits, NextSeq Systems, NovaSeq 6000 with v1.5 reagent kits, HiSeq X, HiSeq 4000, and HiSeq 3000 use the reverse strand workflow.\n", | ||
"\n", | ||
"We need to pay special attention to the orientation of the index read. In this example, the index read is sequenced in the reverse direction." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Read 1\n", | ||
"assay.add_read(\n", | ||
" read_id=\"atac-R1\",\n", | ||
" fastq=\"/data/kzhang/projects/workshop/data/PS_DNA01-1_S2_L001_R1_001.fq.zst\",\n", | ||
" modality=\"atac\",\n", | ||
" is_reverse=False,\n", | ||
" primer_id=\"atac-nextera_read1\",\n", | ||
")\n", | ||
"\n", | ||
"# Index Read\n", | ||
"assay.add_read(\n", | ||
" read_id=\"atac-I1\",\n", | ||
" fastq=\"/data/kzhang/projects/workshop/data/PS_DNA01-1_S2_L001_R2_001.fq.zst\",\n", | ||
" modality=\"atac\",\n", | ||
" is_reverse=True,\n", | ||
" primer_id=\"atac-nextera_read1\",\n", | ||
")\n", | ||
"\n", | ||
"# Read 2\n", | ||
"assay.add_read(\n", | ||
" read_id=\"atac-R2\",\n", | ||
" fastq=\"/data/kzhang/projects/workshop/data/PS_DNA01-1_S2_L001_R3_001.fq.zst\",\n", | ||
" modality=\"atac\",\n", | ||
" is_reverse=True,\n", | ||
" primer_id=\"atac-nextera_read2\",\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can visualize the SeqSpec object again after adding the ATAC data. Note that read information is added to the tree." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"\n", | ||
"├── rna(153-1150)\n", | ||
"│ ├── rna-illumina_p5(29)\n", | ||
"│ ├── rna-truseq_read1(29)\n", | ||
"│ ├── rna-cell_barcode(16)\n", | ||
"│ ├── rna-umi(12)\n", | ||
"│ ├── rna-cDNA(1-1000)\n", | ||
"│ ├── rna-truseq_read2(34)\n", | ||
"│ ├── rna-index7(8)\n", | ||
"│ └── rna-illumina_p7(24)\n", | ||
"└── atac(153-1150)\n", | ||
" ├── atac-illumina_p5(29)\n", | ||
" ├── atac-cell_barcode(16)\n", | ||
" ├── atac-linker(8)\n", | ||
" ├── atac-nextera_read1(33) [↓atac-R1(150), ↑atac-I1(24)]\n", | ||
" ├── atac-gDNA(1-1000)\n", | ||
" ├── atac-nextera_read2(34) [↑atac-R2(150)]\n", | ||
" ├── atac-index7(8)\n", | ||
" └── atac-illumina_p7(24)" | ||
] | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"assay" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can save the SeqSpec object to a yaml file and load it next time." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"with open(\"10x_rna_atac.yaml\", \"w\") as f:\n", | ||
" f.write(assay.to_yaml())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can now start reads mapping and create the fragment file. If you do not have the genome index, you can create it using the `make_genome_index` function." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:39Z \u001b[32mINFO \u001b[0m precellar::align\u001b[90m]\u001b[0m Counting barcodes...\n", | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:39Z \u001b[32mINFO \u001b[0m cached_path::cache\u001b[90m]\u001b[0m Cached version of https://teichlab.github.io/scg_lib_structs/data/10X-Genomics/atac_737K-arc-v1.txt.gz is up-to-date\n", | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:39Z \u001b[33mWARN \u001b[0m seqspec\u001b[90m]\u001b[0m Reads (atac-R1) may contain additional bases downstream of the variable-length region, e.g., adapter sequences.\n", | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[32mINFO \u001b[0m precellar::align\u001b[90m]\u001b[0m Found 12387063 barcodes. 94.19% of them have an exact match in whitelist\n", | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[32mINFO \u001b[0m precellar::align\u001b[90m]\u001b[0m Aligning reads...\n", | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[33mWARN \u001b[0m seqspec\u001b[90m]\u001b[0m Reads (atac-R1) may contain additional bases downstream of the variable-length region, e.g., adapter sequences.\n", | ||
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[33mWARN \u001b[0m seqspec\u001b[90m]\u001b[0m Reads (atac-R2) may contain additional bases downstream of the variable-length region, e.g., adapter sequences.\n", | ||
"100%|██████████| 12387063/12387063 [05:29<00:00, 37567.71it/s]" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"qc = precellar.align(\n", | ||
" assay, \"/data/kzhang/genome/mm10\",\n", | ||
" modality=\"atac\",\n", | ||
" output_fragment=\"atac_fragments.tsv.zst\",\n", | ||
" num_threads=32,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"QC metrics: " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{'frac_q30_bases_barcode': 0.9219054690365263, 'frac_nonnuclear': 7.443249461151526e-05, 'frac_valid_barcode': 0.9815396111249293, 'frac_unmapped': 0.008819422106194463, 'frac_confidently_mapped': 0.9292046064511015, 'frac_fragment_in_nucleosome_free_region': 4.518862917979508e-05, 'frac_duplicates': 0.33183332695206896, 'sequenced_reads': 24774126.0, 'frac_q30_bases_read1': 0.9173089515515771, 'frac_q30_bases_read2': 0.9229901653577338, 'sequenced_read_pairs': 12387063.0, 'frac_fragment_flanking_single_nucleosome': 2.912761037492815e-05}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(qc)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Binary file not shown.
Binary file not shown.
Empty file.
Oops, something went wrong.