ASCETS

Arm-level Somatic Copy-number Events in Targeted Sequencing

Author: Liam F. Spurr, liam.spurr@uchicagomedicine.edu

Contact: Rameen Beroukhim, rameen_beroukhim@dfci.harvard.edu

Please cite: Spurr LF, Touat M, Taylor AM, et al. Quantification of aneuploidy in targeted sequencing data using ASCETS, Bioinformatics (2020), https://doi.org/10.1093/bioinformatics/btaa980.

This repository contains the code to run the ASCETS arm-level copy number events caller for targeted sequencing data. ASCETS produces arm-level copy-number variant calls and arm-level weighted average log2 segment means from segmented copy number data.

How to run ASCETS

System requirements

This code has been tested on Mac and Linux using R version 3.6.1. The user must also install R packages tidyverse and data.table from CRAN prior to running the program.

Command line

ASCETS can be run directly from the command line to facilitate easy and effecient generation of aSCNA calls from user data.

Input files

Required:

CNV segmentation file (.seg)
- We use .seg files from RobustCNV in our pipeline
- The following columns are required in this order (names can vary): sample, chromosome, segment start, segment end, number of markers, log2ratio
- See sample data for an example file
Chromosome arm genomic coordinates
- We supply an example file in the repository for hg19 (original data from UCSC), but other coordinates can be supplied in the same format such as cytoband coordinates (also provided for hg19)
Minimum arm breadth of coverage (BOC) to make a call (range 0.0 - 1.0)
- Optional, defaults to 0.5
- Specify 0.0 to allow any BOC
Output file prefix

Optional:

Arm level-alteration fraction threshold
- Defaults to 0.7
Individual log2 ratio copy ratios (LCRs) that were used to build copy-number segments
- We use exonic calls from RobustCNV
- The following columns are required in this order (names can vary): sample, genomic interval, gene/exon (can be blank), log2ratio
- See sample data for an example file
- If this is not supplied, a default segment mean threshold of +/- 0.2 will be used
- A user-defined threshold can also be supplied instead
Logical value specifying whether to retain noisy segments (T or F)
- Defaults to F

Run the script

Note: must be run from the same directory as the run_ascets.R and ascets_resources.R files

Rscript run_ascets.R \
	-i [CNV segment file] \
	-c [genomic chromosome arm coordinates (or other coordinates)] \
	-m [minimum arm BOC] [OPTIONAL] \
	-a [arm alteration fraction threshold] [OPTIONAL] \
	-e [LCRs used to compute noise] [OPTIONAL] \
	-k [logical specifying whether to keep noisy segments] [OPTIONAL] \
	-t [manual log2 ratio threshold to determine amplifications and deletions] [OPTIONAL, will overwrite noise estimate] \
	-o [file output prefix]

Sample command

Rscript run_ascets.R -i seg_example.seg -c genomic_arm_coordinates_hg19.txt -m 0.5 -e lcr_example.txt -k F -a 0.7 -o sample_output

Output files

Arm-level calls for each arm in each sample in the input (arm_level_calls.txt)
Weighted average segment mean values for each arm in each sample in the input (weighted_average_segmeans.txt)
Aneuploidy scores (range 0-1, fraction of arms amplified or deleted out of total called arms [call ≠ LOWCOV])
Histogram of modeled noise in the segments in the input cohort (noise_hist.pdf)
Parameters used to make arm-level calls (params.txt)
Histogram of the segment means in the input cohort (segmean_hist.pdf)

R Studio

ASCETS can also be run from within R Studio through calling the ascets() function from another script. All required functions are included in the ascets_resources.R file. These functions can be imported and ASCETS can be run using the commands below. Please note that the required data files must be supplied with the column order shown in the sample data. Data may be written to a file using the write_outputs_to_file() function.

Input files

CNV segmentation file (cna argument supplied as a data frame)
- The following columns are required in this order (names can vary): sample, chromosome, segment start, segment end, number of markers, log2ratio
- See sample data for an example file
Chromosome arm genomic coordinates (cytoband argument supplied as a data frame)
- We supply a example files in the repository for hg19/38 (original data from bioMart), but other coordinates can be supplied in the same format such as cytoband coordinates (also provided for hg19)
Minimum arm breadth of coverage (BOC) to make a call (range 0.0 - 1.0; min_cov argument supplied as a numeric value)
- Optional, defaults to 0.5
- Specify 0.0 to allow any BOC
Output file prefix (name argument supplied as a character string)
Individual log2 ratio copy ratios (LCRs) that were used to build copy-number segments (noise argument supplied as a data frame)
- The following columns are required in this order (names can vary): sample, genomic interval, gene/exon (can be blank), log2ratio
- See sample data for an example file
- A user-defined threshold can also be supplied instead (threshold argument supplied as a numeric value)
- If this is not supplied, a default segment mean threshold of +/- 0.2 will be used
Logical value specifying whether to retain noisy segments (keep_noisy argument, suppled as a boolean value: T or F)
- Defaults to F
Arm level-alteration fraction threshold (alteration_threshold argument supplied as a numeric value)
- Defaults to 0.7

Output files

A list containing:

calls: data frame containing aSCNA calls
weight_ave: data frame containing arm weighted average segment means
aneu_scores: data frame containing aneuploidy scores for each sample
amp_thresh: numeric value representing amplification threshold that was used to make aSCNA calls
del_thresh: numeric value representing deletion threshold that was used to make aSCNA calls
alteration_thresh: numeric value representing fraction of arm that must be altered to call an aSCNA
name: character string representing name for dataset that was supplied to the function
cna: data frame containing segmented CNA data that was supplied to the function
noise: data frame containing noise data that was supplied to the function

Run the script

source("ascets_resources.R")
ascets_output <- ascets(cna, cytoband, min_cov, name, noise, keep_noisy, threshold, alteration_threshold)
write_outputs_to_file(ascets_output, location = "output_folder/")

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
sample_data		sample_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ascets_resources.R		ascets_resources.R
cytoband_coordinates_hg19.txt		cytoband_coordinates_hg19.txt
genomic_arm_coordinates_hg19.txt		genomic_arm_coordinates_hg19.txt
genomic_arm_coordinates_hg38.txt		genomic_arm_coordinates_hg38.txt
run_ascets.R		run_ascets.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASCETS

How to run ASCETS

System requirements

Command line

Input files

Run the script

Sample command

Output files

R Studio

Input files

Output files

Run the script

About

Releases 4

Packages

Languages

License

beroukhim-lab/ascets

Folders and files

Latest commit

History

Repository files navigation

ASCETS

How to run ASCETS

System requirements

Command line

Input files

Run the script

Sample command

Output files

R Studio

Input files

Output files

Run the script

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages