Skip to content

Commit

Permalink
nested cross product help
Browse files Browse the repository at this point in the history
  • Loading branch information
jmchilton committed Aug 14, 2024
1 parent 1f0984e commit 0c3e51e
Show file tree
Hide file tree
Showing 9 changed files with 81 additions and 3 deletions.
84 changes: 81 additions & 3 deletions lib/galaxy/tools/cross_product_nested.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,90 @@
Synopsis
========
This tool performs something like a Cartesian Product or Cross Join of two collections
and is mainly used to setup an all-vs-all style analysis across two collections of datasets.
While a description of what it does standalone will seem a bit technical and math heavy, how
it works within an ad hoc analysis or workflow can be quite straight forward and hopefully easier
to understand. For this reason, the next section describes how to use this tool in context and
the technical details follow after that but aren't nessecary to understand how to use this tool
in simple ways.
===========
Description
===========
====================
How to use this tool
====================
This tool can be used in and out of workflows, but lets use a workflow to illustrate the ordering of
tools. Imagine you have a tool that compares two individual datasets and you run this tool with two
lists of datasets. This simiple case is shown below:
.. image:: ${static_path}/images/tools/collection_ops/dot_product.png
:alt: The Dot Product of Two Collections
:width: 500
In this configuration - the two datasets will be matched and compared element-wise. So the first dataset
of "Input List 1" will be compared to the first dataset in "Input List 2" and the resulting
dataset will be the first dataset in an output list. In this configuration the lists need to have
the same number of elements and ideally matching element identifiers.
This is a very natural way to "map" an operation (or in Galaxy parlance, a tool) over two lists.
Sometimes however the desire is to compare each element of the first list to each element of the
second list. This tool enables that. Running input lists through this tool produces new list
structures (described in detail below) that when using the same natural "map" over semantics described
above produce every combination of the elements of the two lists compared against each other.
Running a tool with these two outputs instead of the inital two input produces a nested list
structure where the jth element of the inner list of the ith element of the outer list is a
comparison of the ith element of the first list to the jth element of the second list.
Put more simply, the result is a nested list where the identifiers of an element describe which inputs were
matched to produce the comparison output found at that element.
.. image:: ${static_path}/images/tools/collection_ops/nested_crossproduct_output.png
:alt: The Cartesian Product of Two Collections
:width: 500
============================================
What this tool does (the technical details)
============================================
This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a``
has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
produces a pair of output nested lists (``list:list``) with where the outer list is of length ``n``
and each inner list has a length of ``m`` (a ``n X m`` nested list). The jth element
inside the outer list's ith element is a pseudo copy of the ith dataset of ``inputa``. One
way to think about the output nested lists is as matrices. Here is a diagram of the first output
showing the element identifiers of the outer and inner lists along with the what dataset is being
"copied" into this new collection.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_1.png
:alt: Nested Cross Product First Output
:width: 500
The second output is a nested list of psuedo copies of the elements of ``input_b`` instead of
``input_a``. In particular the outer list is again of length ``n`` and each inner list is again
of lenth ``m`` but this time the jth element inside the outer list's ith element is a pseudo copy
of the jth dataset of ``inputb``. Here is the matrix of these outputs.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_2.png
:alt: Nested Cross Product Second Output
:width: 500
These nested list structures might appear to be a little odd, but they have the very useful property
that if you match up corresponding elements of the nested lists the result is each combination of
elements in inputa and inputb are matched up once. The following diagram describes these matching
datasets.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_matching.png
:alt: Matching Inputs
:width: 500
Running a tool that compares two datasets with these two nested lists produces a new nested list
as described above. The following diagram shows the structure of this output and how the element
identifiers are preserved and indicate what comparison was performed.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_output.png
:alt: Matching Inputs
:width: 500
----
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0c3e51e

Please sign in to comment.