Version 0.3.2
The beelinetools
script perform different tasks on Illumina's Beeline final
report (long format). Such tasks include conversion to the Plink format and
data extraction.
The script works on both Python 2.7 and higher, and Python 3.4 and higher. It has been developed and tested under Linux operating system, but should work on MacOS X and Windows operating systems.
Running the script with with the -h
or --help
flag will provide usage
information.
$ beelinetools.py --help
usage: beelinetools.py [-h] [-v] {convert,sample-split,extract} ...
Performs different tasks on Illumina's beeline report(s) (version 0.3.2).
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Analysis Type:
The type of analysis to be performed on the Beeline (long) report. The
following list of analysis is available.
{convert,sample-split,extract}
convert Converts the Beeline long report to other format.
sample-split Split the report so that there is one file for each
sample.
extract Extract information from the Beeline long report.
For now, three analysis types are available: conversion (from Beeline to Plink) and extraction of data on user defined chromosomes.
It is possible to convert Illumina's Beeline report to Plink's pedfile or
binary for downstream analysis. For more information, use the -h
or --help
option of the convert
analysis type.
$ beelinetools.py convert --help
usage: beelinetools.py convert [-h] [-v] -i FILE [FILE ...] -m FILE
[--beeline-id COL] [--beeline-sample COL]
[--beeline-a1 COL] [--beeline-a2 COL]
[--beeline-strand STR] [--map-id COL]
[--map-chr COL] [--map-pos COL]
[--map-allele COL] [--map-illumina-id COL]
[--map-illumina-strand COL]
[--map-ref-strand COL] [--map-delim SEP]
[--nb-snps-kw KEYWORD] [-o DIR]
[--format FORMAT]
Conversion from the Beeline long report format to other, more commonly used,
format (such as Plink) (version 0.3.2).
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Input Files:
-i FILE [FILE ...], --input FILE [FILE ...]
The name of the input file(s). Use '-' only once to
read on the standard input (STDIN).
-m FILE, --map FILE The name of the file containing mapping information.
Beeline Options:
--beeline-id COL The name of the column containing the marker
identification number for beeline. [SNP Name]
--beeline-sample COL The name of the column containing the sample
identification number for beeline. [Sample Name]
--beeline-a1 COL The name of the column containing the first allele for
beeline. [Allele1 - Forward]
--beeline-a2 COL The name of the column containing the second allele
for beeline. [Allele2 - Forward]
--beeline-strand STR The strand for the alleles. This will help in
determining the A/B alleles. If unset, the strand will
be infer by the name of the columns for the two
alleles.
Mapping Options:
--map-id COL The name of the column containing the marker
identification numbers. [Name]
--map-chr COL The name of the column containing the chromosome.
[Chr]
--map-pos COL The name of the column containing the position.
[MapInfo]
--map-allele COL The name of the column containing the alleles. [SNP]
--map-illumina-id COL
The name of the column containing the Illumina ID.
This ID contains information about the marker's
Forward strand and help determining the A/B alleles.
[IlmnID]
--map-illumina-strand COL
The name of the column containing the Illumina strand.
This information helps in determining the A/B alleles
since it contains the marker's orientation for the TOP
alleles. [IlmnStrand]
--map-ref-strand COL The name of the column containing the reference
strand. This information helps in determining the A/B
alleles since it contains the marker's orientation for
the Plus alleles. [RefStrand]
--map-delim SEP The field delimiter. [,]
--nb-snps-kw KEYWORD The keyword that describe the number of used markers
for the report(s) (useful if beeline header format
changes). [Num Used SNPs]
Output Directory:
-o DIR, --output-dir DIR
The output directory (default is working directory).
Output Format:
--format FORMAT The output format (one of 'bed' or 'ped' for binary or
normal pedfile format from Plink). [bed]
Data extraction is possible by using the extract
analysis type. Extraction of
multiple chromosomes is possible. Note that, for now, the report created is no
longer compatible with beelinetools
. For more information, use the -h
or
--help
options.
$ beelinetools.py extract --help
usage: beelinetools.py extract [-h] [-v] -i FILE [FILE ...] -m FILE
[--beeline-id COL] [--beeline-sample COL]
[--beeline-a1 COL] [--beeline-a2 COL]
[--beeline-strand STR] [--map-id COL]
[--map-chr COL] [--map-pos COL]
[--map-allele COL] [--map-illumina-id COL]
[--map-illumina-strand COL]
[--map-ref-strand COL] [--map-delim SEP]
[--nb-snps-kw KEYWORD] [-o DIR]
[-c CHROM [CHROM ...]] [-k FILE] [-s STR]
[--output-delim SEP]
Extracts information from a Beeline long report (version 0.3.2).
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Input Files:
-i FILE [FILE ...], --input FILE [FILE ...]
The name of the input file(s). Use '-' only once to
read on the standard input (STDIN).
-m FILE, --map FILE The name of the file containing mapping information.
Beeline Options:
--beeline-id COL The name of the column containing the marker
identification number for beeline. [SNP Name]
--beeline-sample COL The name of the column containing the sample
identification number for beeline. [Sample Name]
--beeline-a1 COL The name of the column containing the first allele for
beeline. [Allele1 - Forward]
--beeline-a2 COL The name of the column containing the second allele
for beeline. [Allele2 - Forward]
--beeline-strand STR The strand for the alleles. This will help in
determining the A/B alleles. If unset, the strand will
be infer by the name of the columns for the two
alleles.
Mapping Options:
--map-id COL The name of the column containing the marker
identification numbers. [Name]
--map-chr COL The name of the column containing the chromosome.
[Chr]
--map-pos COL The name of the column containing the position.
[MapInfo]
--map-allele COL The name of the column containing the alleles. [SNP]
--map-illumina-id COL
The name of the column containing the Illumina ID.
This ID contains information about the marker's
Forward strand and help determining the A/B alleles.
[IlmnID]
--map-illumina-strand COL
The name of the column containing the Illumina strand.
This information helps in determining the A/B alleles
since it contains the marker's orientation for the TOP
alleles. [IlmnStrand]
--map-ref-strand COL The name of the column containing the reference
strand. This information helps in determining the A/B
alleles since it contains the marker's orientation for
the Plus alleles. [RefStrand]
--map-delim SEP The field delimiter. [,]
--nb-snps-kw KEYWORD The keyword that describe the number of used markers
for the report(s) (useful if beeline header format
changes). [Num Used SNPs]
Output Directory:
-o DIR, --output-dir DIR
The output directory (default is working directory).
Extraction Options:
-c CHROM [CHROM ...], --chr CHROM [CHROM ...]
The chromosome to extract. [['1', '2', '3', '4', '5',
'6', '7', '8', '9', '10', '11', '12', '13', '14',
'15', '16', '17', '18', '19', '20', '21', '22', '23',
'24', '25', '26']]
-k FILE, --keep FILE A list of samples to extract.
Output Options:
-s STR, --suffix STR The suffix to add to the output file(s). [_extract]
--output-delim SEP The output file field delimiter. [,]
It is possible to split a beeline report in one file per sample by using the
sample-split
analysis type. For more information, use the -h
or --help
options.
$ beelinetools.py sample-split --help
usage: beelinetools.py sample-split [-h] [-v] -i FILE [FILE ...] -m FILE
[--beeline-id COL] [--beeline-sample COL]
[--beeline-a1 COL] [--beeline-a2 COL]
[--beeline-strand STR] [--map-id COL]
[--map-chr COL] [--map-pos COL]
[--map-allele COL] [--map-illumina-id COL]
[--map-illumina-strand COL]
[--map-ref-strand COL] [--map-delim SEP]
[--nb-snps-kw KEYWORD] [-o DIR]
[--keep-metadata] [--add-ab]
[--add-mapping] [--output-delim SEP]
The long report will be split so that each sample has its own separate file
(version 0.3.2).
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Input Files:
-i FILE [FILE ...], --input FILE [FILE ...]
The name of the input file(s). Use '-' only once to
read on the standard input (STDIN).
-m FILE, --map FILE The name of the file containing mapping information.
Beeline Options:
--beeline-id COL The name of the column containing the marker
identification number for beeline. [SNP Name]
--beeline-sample COL The name of the column containing the sample
identification number for beeline. [Sample Name]
--beeline-a1 COL The name of the column containing the first allele for
beeline. [Allele1 - Forward]
--beeline-a2 COL The name of the column containing the second allele
for beeline. [Allele2 - Forward]
--beeline-strand STR The strand for the alleles. This will help in
determining the A/B alleles. If unset, the strand will
be infer by the name of the columns for the two
alleles.
Mapping Options:
--map-id COL The name of the column containing the marker
identification numbers. [Name]
--map-chr COL The name of the column containing the chromosome.
[Chr]
--map-pos COL The name of the column containing the position.
[MapInfo]
--map-allele COL The name of the column containing the alleles. [SNP]
--map-illumina-id COL
The name of the column containing the Illumina ID.
This ID contains information about the marker's
Forward strand and help determining the A/B alleles.
[IlmnID]
--map-illumina-strand COL
The name of the column containing the Illumina strand.
This information helps in determining the A/B alleles
since it contains the marker's orientation for the TOP
alleles. [IlmnStrand]
--map-ref-strand COL The name of the column containing the reference
strand. This information helps in determining the A/B
alleles since it contains the marker's orientation for
the Plus alleles. [RefStrand]
--map-delim SEP The field delimiter. [,]
--nb-snps-kw KEYWORD The keyword that describe the number of used markers
for the report(s) (useful if beeline header format
changes). [Num Used SNPs]
Output Directory:
-o DIR, --output-dir DIR
The output directory (default is working directory).
Split Options:
--keep-metadata Keeps the meta data ([Header]) in each of the output
reports.
--add-ab Adds the A/B alleles in the output file (sometimes
required by other software).
--add-mapping Adds mapping information (chromosome/Position).
--output-delim SEP The output file field delimiter. [,]
To test the script, perform the following command:
$ python test_beelinetools.py
..................................................................
----------------------------------------------------------------------
Ran 66 tests in 0.116s
OK