Skip to content

Commit

Permalink
Tool documentation for cross product tools.
Browse files Browse the repository at this point in the history
  • Loading branch information
jmchilton committed Aug 14, 2024
1 parent 1f0984e commit 2873f70
Show file tree
Hide file tree
Showing 14 changed files with 149 additions and 6 deletions.
60 changes: 57 additions & 3 deletions lib/galaxy/tools/cross_product_flat.xml
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,66 @@
Synopsis
========
@CROSS_PRODUCT_INTRO@
====================
How to use this tool
====================
===========
Description
===========
@GALAXY_DOT_PRODUCT_SEMANTICS@
Running input lists through this tool produces new dataset lists (described in detail below) that when using
the same natural element-wise matching "map" over semantics described above produce every combination of the
elements of the two lists compared against each other. Running a tool with these two outputs instead of the inital
two input produces a list of the comparison of each combination of pairs from the respective inputs.
.. image:: ${static_path}/images/tools/collection_ops/flat_crossproduct_output.png
:alt: The Flat Cartesian Product of Two Collections
:width: 500
The result of running a subsequent tool with the outputs produced by this tool will be a much larger list
whose element identifiers are the concatenation of the combinations of the elements identifiers from the
two input lists.
.. image:: ${static_path}/images/tools/collection_ops/flat_crossproduct_separator.png
:alt: Flat Cross Product Identifier Separator
:width: 500
============================================
What this tool does (technical details)
============================================
This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a``
has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
produces a pair of larger lists - each of size ``n*m``.
Both output lists will be the same length and contain the same set of element identifiers in the
same order. If the kth input can be described as ``(i-1)*n + (j-1)`` where ``1 <= i <= m`` and ``1 <= j <= n``
then the element identifier for this kth element is the concatenation of the element identifier for
the ith item of ``input_a`` and the jth item of ``input_b``.
In the first output list, this kth element will be the ith element of ``input_a``. In the second
output list, the kth element will be the jth element of ``input_b``.
.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_outputs.png
:alt: Flat Cross Product Outputs
:width: 500
These list structures might appear to be a little odd, but they have the very useful property
that if you match up corresponding elements of the lists the result is each combination of
elements in ``input_a`` and ``input_b`` are matched up once.
.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_matched.png
:alt: Flat Cross Product Matching Datasets
:width: 500
Running a downstream comparison tool that compares two datasets with these two lists produces a
new list with every combination of comparisons.
.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_downstream.png
:alt: Flat Cross Product All-vs-All Result
:width: 500
----
Expand Down
63 changes: 60 additions & 3 deletions lib/galaxy/tools/cross_product_nested.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,69 @@
Synopsis
========
@CROSS_PRODUCT_INTRO@
====================
How to use this tool
====================
===========
Description
===========
@GALAXY_DOT_PRODUCT_SEMANTICS@
Running input lists through this tool produces new list structures (described in detail below) that when using
the same natural element-wise matching "map" over semantics described above produce every combination of the
elements of the two lists compared against each other. Running a tool with these two outputs instead of the inital
two input produces a nested list structure where the jth element of the inner list of the ith element of the outer
list is a comparison of the ith element of the first list to the jth element of the second list.
Put more simply, the result is a nested list where the identifiers of an element describe which inputs were
matched to produce the comparison output found at that element.
.. image:: ${static_path}/images/tools/collection_ops/nested_crossproduct_output.png
:alt: The Cartesian Product of Two Collections
:width: 500
============================================
What this tool does (technical details)
============================================
This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a``
has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
produces a pair of output nested lists (specifically of the ``list:list`` collection type) where
the outer list is of length ``n`` and each inner list has a length of ``m`` (a ``n X m`` nested list). The jth element
inside the outer list's ith element is a pseudo copy of the ith dataset of ``inputa``. One
way to think about the output nested lists is as matrices. Here is a diagram of the first output
showing the element identifiers of the outer and inner lists along with the what dataset is being
"copied" into this new collection.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_1.png
:alt: Nested Cross Product First Output
:width: 500
The second output is a nested list of psuedo copies of the elements of ``input_b`` instead of
``input_a``. In particular the outer list is again of length ``n`` and each inner list is again
of lenth ``m`` but this time the jth element inside the outer list's ith element is a pseudo copy
of the jth dataset of ``inputb``. Here is the matrix of these outputs.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_2.png
:alt: Nested Cross Product Second Output
:width: 500
These nested list structures might appear to be a little odd, but they have the very useful property
that if you match up corresponding elements of the nested lists the result is each combination of
elements in ``input_a`` and ``input_b`` are matched up once. The following diagram describes these matching
datasets.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_matching.png
:alt: Matching Inputs
:width: 500
Running a tool that compares two datasets with these two nested lists produces a new nested list
as described above. The following diagram shows the structure of this output and how the element
identifiers are preserved and indicate what comparison was performed.
.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_output.png
:alt: Matching Inputs
:width: 500
----
Expand Down
32 changes: 32 additions & 0 deletions lib/galaxy/tools/model_operation_macros.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,38 @@
class="ModelOperationToolAction"/>
</xml>
<token name="@QUOTA_USAGE_NOTE@">This tool will create new history datasets copied from your input collections but your quota usage will not increase.</token>
<token name="@CROSS_PRODUCT_INTRO@"><![CDATA[
This tool organizes two dataset lists so that Galaxy's normal collection processing produces
an all-vs-all style analyses of the initial inputs when applied to the outputs of this tool.
While a description of what it does standalone is technical and math heavy, how
it works within an ad-hoc analysis or workflow can be quite straight forward and hopefully is easier
to understand. For this reason, the next section describes how to use this tool in context and
the technical details follow after that. Hopefully, the "how it works" details aren't nessecary to
understand the "how to use it" details of this tool - at least for simple things.
]]>
</token>
<token name="@GALAXY_DOT_PRODUCT_SEMANTICS@"><![CDATA[
This tool can be used in and out of workflows, but workflows will be used to illustrate the ordering of
tools and connections between them. Imagine a tool that compares two individual datasets and how
that might be connected to list inputs in a workflow. This simiple case is shown below:
.. image:: ${static_path}/images/tools/collection_ops/dot_product.png
:alt: The Dot Product of Two Collections
:width: 500
In this configuration - the two datasets will be matched and compared element-wise. So the first dataset
of "Input List 1" will be compared to the first dataset in "Input List 2" and the resulting
dataset will be the first dataset in the output list generated using this comparison tool. In this configuration
the lists need to have the same number of elements and ideally matching element identifiers.
This matching up of elements is a very natural way to "map" an operation (or in Galaxy parlance, a tool)
over two lists. However, sometimes the desire is to compare each element of the first list to each element of the
second list. This tool enables that.
]]></token>

<xml name="annotate_as_aggregation_operation">
<edam_operations>
<edam_operation>operation_3436</edam_operation> <!-- DataHandling -> Aggregation -->
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2873f70

Please sign in to comment.