forked from The-Sequence-Ontology/GAL
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
232 lines (193 loc) · 9.97 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
NAME
GAL::Annotation - Genome Annotation Library
VERSION
This document describes GAL::Annotation version 0.01
SYNOPSIS
use GAL::Annotation;
# Assuming defaults (GFF3 parser and SQLite storage)
my $annot = GAL::Annotation->new(qw(file.gff file.fasta);
my $features = $annot->features;
# Otherwise be explicit about everything.
my %feat_store_args = (class => 'SQLite',
database => '/path/to/file.gff'
);
my $feat_store = GAL::Annotation->new(storage => \%feat_store_args,
fasta => '/path/to/file.fa');
$feat_store->load_files($feature_file);
my $features = $feat_store->schema->resultset('Feature');
# Either way, once you have features - get to work.
my $mrnas = $features->search({type => 'mRNA'});
while (my $mrna = $mrnas->next) {
print $mrna->feature_id . "\n";
my $CDSs = $mrna->CDSs;
while (my $CDS = $CDSs->next) {
print join "\n", ($CDS->start,
$CDS->end,
$CDS->seq,
);
}
}
DESCRIPTION
The Genome Annotation Library (GAL) is a collection of modules that
strive to make working with genome annotations simple, intuitive and
fast. Users of GAL first create an annotation object which in turn will
contain Parser, Storage and Schema objects. The parser allows features
to be loaded into GAL's storage from a variety of formats. The storage
object specifies how the features should be stored, and the schema
object provides flexible query and iterator functions over the features.
In addtion, Index objects (not yet implimented) provide additional
key/value mapped look up tables, and List objects provide aggregation
and analysis functionality for lists of feature attributes.
A wide variety of parsers are available to convert sequence features
from various formats, and new parsers are easy to write. See GAL::Parser
for more details. Currently SQLite and MySQL storage options are
available (a fast RAM storage engine is on the TODO list). Schema
objects are provided by DBIx::Class and a familiarity with that package
is necessary to fully understand how to query and iterate over feature
objects.
CONSTRUCTOR
New Annotation objects are created by the class method new. Arguments
should be passed to the constructor as a list (or reference) of key
value pairs. All attributes of the Annotation object can be set in the
call to new, but reasonable defaults will be used where ever possilbe to
keep object creation simple. An simple example of object creation would
look like this:
my $feat_store = GAL::Annotation->new($gff_file);
my $feat_store = GAL::Annotation->new($gff_file, $fasta_file);
The resulting object would use a GFF3 parser and SQLite storage by
default The first example would not have access to feature sequence, the
second one would.
A more complex object creation might look like this:
my $feat_store = GAL::Annotation->new(parser => {class => gff3},
storage => {class => mysql,
dsn => 'dbi:mysql:database'
user => 'me',
password => 'secret'
fasta => '/path/to/fasta/files/'
);
The constructor recognizes the following parameters which will set the
appropriate attributes:
* `parser => parser_subclass [gff3]'
This optional parameter defines which parser subclass to
instantiate. This parameter will default to gff3 if not provided.
See GAL::Parser for a complete list of available parser classes.
* `storage => storage_subclass [SQLite]'
This optional parameter defines which storage subclass to
instantiate. Currently available storage classes are SQLite (the
default) and mysql.
* `fasta => '/path/to/fasta/files/'
This optional parameter defines a path to a fasta file or a
collection of fasta files that correspond the annotated features.
The IDs (first contiguous non-whitespace charachters) of the fasta
headers must correspond to the sequence IDs (seqids) in the
annotated features. The fasta parameter is optional, but if the
fasta attribute is not set then the features will not have access to
their sequence. Access to the sequence in provided by
Bio::DB::Fasta.
new
Title : new
Usage : GAL::Annotation->new();
Function: Creates a GAL::Annotation object;
Returns : A GAL::Annotation object
Args : A list of key value pairs for the attributes specified above.
ATTRIBUTES
All attributes can be supplied as parameters to the GAL::Annotation
constructor as a list (or referenece) of key value pairs.
parser
Title : parser
Usage : $parser = $self->parser();
Function: Create or return a parser object.
Returns : A GAL::Parser::subclass object.
Args : (class => gal_parser_subclass)
See GAL::Parser and its subclasses for more arguments.
Notes : The parser object is created as a singleton, but it
can be changed by passing new arguments to a call to
parser.
storage
Title : storage
Usage : $storage = $self->storage();
Function: Create or return a storage object.
Returns : A GAL::Storage::subclass object.
Args : (class => gal_storage_subclass)
See GAL::Storage and its subclasses for more arguments.
Notes : The storage object is created as a singleton and can not be
destroyed or recreated after being created.
fasta
The fasta attribute is provided by GAL::Base, see that module for
more details.
Methods
features
Title : features
Usage : $self->features();
Function: Return a GAL::Schema::Result::Feature object (a
DBIx::Class::ResultSet for all features).
Returns : A GAL::Schema::Result::Feature object
Args : N/A
schema
Title : schema
Usage : $self->schema();
Function: Create and/or return the DBIx::Class::Schema object
Returns : DBIx::Class::Schema object.
Args : N/A - Arguments are provided by the GAL::Storage object.
load_files
Title : load_files
Usage : $a = $self->load_files();
Function: Parse and store all of the features in a file. If a single
file is given as an argument and if there are gff[3] and
sqlite versions of that files base name then time stamps
are compared and the database is only (re)loaded if the
GFF3 file is newer.
Returns : N/A
Args : A list of files.
Notes : Default
DIAGNOSTICS
<GAL::Annotation> currently does not throw any warnings or errors,
but most other modules in the library do, and details of those
errors can be found in those modules.
CONFIGURATION AND ENVIRONMENT
<GAL::Annotation> requires no configuration files or environment
variables.
DEPENDENCIES
Modules in GAL/lib use the following modules:
Bio::DB::Fasta Carp DBD::SQLite DBI List::Util Scalar::Util
Set::IntSpan::Fast Statistics::Descriptive Text::RecordParser
Some script in GAL/bin and/or GAL/lib/GAL/t use the following
modules:
Data::Dumper FileHandle Getopt::Long IO::Prompt List::MoreUtils
TAP::Harness Test::More Test::Pod::Coverage URI::Escape
XML::LibXML::Reader
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
I'm sure there are plenty of bugs right now - please let me know if
you find one.
Please report any bugs or feature requests to:
barry.moore@genetics.utah.edu
AUTHOR
Barry Moore <barry.moore@genetics.utah.edu>
LICENCE AND COPYRIGHT
Copyright (c) 2010, Barry Moore <barry.moore@genetics.utah.edu>. All
rights reserved.
This module is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE
LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS
AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND
PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE
DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR,
OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE,
BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR
THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER
SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF
THE POSSIBILITY OF SUCH DAMAGES.