Skip to content

CoreGx Design Documentation

Christopher Eeles edited this page Feb 16, 2022 · 17 revisions

TreatmentResponseExperiment

Object Design

TODO::

Object Dimensions

A TreatmentResponseExperiment (TRE) has list-like and table-like behaviors. For table-like behaviors, rows are defined by one or more key columns which uniquely identify each row of the data.table in the rowData slot. These columns are referred to as the rowIDs and are concatenated together with the ':' character to make pseudo-rownames. The same is true for the colData table, with associated colIDs and pseudo-colnames.

Use of such pseudo-dimnames allows a TRE to be subset analogously to a base data.frame by specifying the dimension names of the "rows" or "columns" of the object. As a result the [ method exploits the table-like behaviours of the object. In addition to data.frame like subsets, two additional mechanism for sub-setting have been implemented. Firstly, pseudo-dimnames can be specified using glob or regex patterns, which are matched against the pseudo-dimnames before returning the subset. Secondly, the [ method allows use of data.table style subsets using expressions, with the caveat that any expression subset query need to be wrapped in the .() function to protect calls from early evaluation during S4-method dispatch. These protect expressions are then passed through to the i argument of the rowData or colData data.tables.

The assays slot of a TRE contains the measurements of interest in the object and posses list-like behaviors. You can access and assign an assay via the $ and [[ methods. However, table-like subsets on the object via [ or subset do the necessary internal work to subset each item in the assays list as well.

Assay Index

The assay index table was introduced to allow aggregation operations over rowKey and colKey values to be stored inside a TreatmentResponseExperiment. Previously assays were keyed directly by the values of rowKey and colKey and thus no assay could store a summary over the rowID or colID columns. This effectively made it impossible to store interesting aggregations, for example summaries over dose or replicates, inside a TreatmentResponseExperiment object.

To resolve this issue, two additional pieces of structural metadata have been added to the .intern slot. The assayIndex is a table which maps from rowKey and colKey combinations to an integer key for each assay table. The assayKeys are a list of rowIDs and colIDs which are required to uniquely identify a measurement in an assay. The assayKeys are used to define an integer assay key column in each assay data.table. This prevents unnecessary repetition of character metadata columns inside the assays of a TRE and acts as a form of compression vs storing the data in a single, long-format data.table. Initial tests indicate about a 50% reduction in object size vs the long-format data.table, which will increase with the number of rowData and colData columns, but decrease slightly with the number of assays in a TRE.

Summaries inside of a specific assay can be stored by repeating the value of the associated assayKey in the corresponding column of the assayIndex. This ensures that the data which has been aggregated over can still be retrieved while also allowing storage of summaries over some subset of rowKey and colKey values. For now, the assayIndex will contain a column for each assay in the TRE, even if the assays is "parallel" to other assays (i.e., keyed by the same columns). While this does slightly increase the size of the object due to storing repeated information, it greatly simplifies the logic required for subsets, as well as for assigning new assays or computing summaries over an existing assay. The cost of this is on the order of 10s of MB per additional assay (assuming ~3 million rows per assay).

Clone this wiki locally