-
Notifications
You must be signed in to change notification settings - Fork 0
Understanding Filtering: Learn to harness FAT's DSL to Power Filtering Schemes
Filtering Analysis Tool (FAT) uses its own domain-specific language (DSL) to define, process, and apply the filtering scheme defined in the configuration YAML to the input tab-delimited (tsv) file.
The DSL FAT uses to apply filtering encompasses several built-in operators.
The following arithmetic operators are supported by the DSL:
- + : Addition
- - : Subtraction
- / : Division
- | : Null (do nothing)
A multiplication operator is slated to be added in the next release.
The following relational operators are supported by the DSL:
- >= : Greater than or equal to
- <= : Less than or equal to
- == : Equal to
- /= : Not equal to
- =~ : Regex
Strict less than and greater than operators are slated to be added in the next release.
Data in the following structures are supported by the DSL:
- x,y : This structure should be used for two-piece data (i.e. 32,54)
- y,x : This structure should be used for two-piece data, but will treat the second element first. (i.e. 54,32)
- x,_ : This structure should be used for two-piece data, but will ignore the second element. (i.e. 32,54 -> 32)
- _,y : This structure should be used for two-piece data, but will ignore the first element. (i.e. 32,54 -> 54)
- x : This structure should be used for one-piece data (2.241 or missense)
Other structures can be added and supported, please create an issue for this request.
The configuration YAML allows for a modular and extensible way to power the filtering schemes and customization of the resultant XLSX file.
The filtering schemes are defined using the filters associative array.
For example,
filters:
- filtering_type: 'BINARY'
filtering_column: 'tumor_exome_day0_var_count'
filtering_column_type: 'x'
filtering_operator: '|'
filtering_string:
bfs_numeric_operator: '>='
bfs_numeric_number: '5'
- filtering_type: 'BINARY'
filtering_column: 'normal_exome_day0_VAF'
filtering_column_type: 'x'
filtering_operator: '|'
filtering_string:
bfs_numeric_operator: '<='
bfs_numeric_number: '4.99'
- filtering_type: 'BINARY'
filtering_column: 'Capture_Val_Status'
filtering_column_type: 'x'
filtering_operator: '|'
filtering_string:
bfs_string_operator: '=='
bfs_string_literal:
- 'PASS'
- ...
Each filters entry is made up by the following fields:
-
filtering_type : BINARY or TRINARY
- BINARY filters are used to calculate the overall filtering status of the each row, TRINARY filters are not.
- BINARY filters are pass/fail, TRINARY filters are split into three categories.
- BINARY filters are encoded (by default) such that passing values are output as green background colored cells, and failing values are output as red background cells (colors can be changed).
-
filtering_column : ColumnName
- ColumnName must be the name of a column/field that exists within the input tab-delimited file.
-
filtering_column_type : ColumnType
-
ColumnType must be one of the following:
- x
- x,y
-
ColumnType must be one of the following:
-
filtering_operator : FilteringOperator
-
FilteringOperator must be one of the following:
- +
- -
- /
- |
-
FilteringOperator must be one of the following:
-
filtering_string : FilteringString
-
FilteringString will have one of the following sets of associative key/value pairs:
-
bfs_string : This type of filtering string should be used for string-based comparisons:
- ==
- /=
- =~
-
bfs_numeric : This type of filtering string should be used for arithmetic comparisons:
- ==
- >=
- <=
-
tfs_string : This type of filtering string should be used for string-based comparisons:
- ==
- /=
- =~
-
tfs_numeric : This type of filtering string should be used for arithmetic comparisons:
- ==
- >=
- <=
-
bfs_string : This type of filtering string should be used for string-based comparisons:
-
FilteringString will have one of the following sets of associative key/value pairs:
The only difference between a bfs and tfs filtering_string is that bfs is used to denote a binary filtering string and tfs is used to denote a trinary filtering string.
The filtering string needs to correspond to the filtering_type.
Example:
- filtering_type: 'BINARY'
...
...
...
filtering_string:
bfs_numeric_operator: '>='
bfs_numeric_number: '5'
Keep in mind, a column/field in the tab-delimited file can only be subject to single filter.