Tool documentation for cross product tools.

jmchilton · Aug 14, 2024 · 2873f70 · 2873f70
1 parent 1f0984e
commit 2873f70
Show file tree

Hide file tree

Showing 14 changed files with 149 additions and 6 deletions.
diff --git a/lib/galaxy/tools/cross_product_flat.xml b/lib/galaxy/tools/cross_product_flat.xml
@@ -72,12 +72,66 @@
 Synopsis
 ========
 
+@CROSS_PRODUCT_INTRO@
 
+====================
+How to use this tool
+====================
 
-===========
-Description
-===========
+@GALAXY_DOT_PRODUCT_SEMANTICS@
 
+Running input lists through this tool produces new dataset lists (described in detail below) that when using
+the same natural element-wise matching "map" over semantics described above produce every combination of the
+elements of the two lists compared against each other. Running a tool with these two outputs instead of the inital
+two input produces a list of the comparison of each combination of pairs from the respective inputs.
+
+.. image:: ${static_path}/images/tools/collection_ops/flat_crossproduct_output.png
+  :alt: The Flat Cartesian Product of Two Collections
+  :width: 500
+
+The result of running a subsequent tool with the outputs produced by this tool will be a much larger list
+whose element identifiers are the concatenation of the combinations of the elements identifiers from the
+two input lists.
+
+.. image:: ${static_path}/images/tools/collection_ops/flat_crossproduct_separator.png
+  :alt: Flat Cross Product Identifier Separator
+  :width: 500
+
+============================================
+What this tool does (technical details)
+============================================
+
+This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a``
+has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
+has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
+produces a pair of larger lists - each of size ``n*m``.
+
+Both output lists will be the same length and contain the same set of element identifiers in the
+same order. If the kth input can be described as ``(i-1)*n + (j-1)`` where ``1 <= i <= m`` and ``1 <= j <= n`` 
+then the element identifier for this kth element is the concatenation of the element identifier for
+the ith item of ``input_a`` and the jth item of ``input_b``.
+
+In the first output list, this kth element will be the ith element of ``input_a``. In the second
+output list, the kth element will be the jth element of ``input_b``.
+
+.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_outputs.png
+  :alt: Flat Cross Product Outputs
+  :width: 500
+
+These list structures might appear to be a little odd, but they have the very useful property
+that if you match up corresponding elements of the lists the result is each combination of
+elements in ``input_a`` and ``input_b`` are matched up once.
+
+.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_matched.png
+  :alt: Flat Cross Product Matching Datasets
+  :width: 500
+
+Running a downstream comparison tool that compares two datasets with these two lists produces a
+new list with every combination of comparisons.
+
+.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_downstream.png
+  :alt: Flat Cross Product All-vs-All Result
+  :width: 500
 
 ----
 

diff --git a/lib/galaxy/tools/cross_product_nested.xml b/lib/galaxy/tools/cross_product_nested.xml
@@ -76,12 +76,69 @@
 Synopsis
 ========
 
+@CROSS_PRODUCT_INTRO@
 
+====================
+How to use this tool
+====================
 
-===========
-Description
-===========
+@GALAXY_DOT_PRODUCT_SEMANTICS@
 
+Running input lists through this tool produces new list structures (described in detail below) that when using
+the same natural element-wise matching "map" over semantics described above produce every combination of the
+elements of the two lists compared against each other. Running a tool with these two outputs instead of the inital
+two input produces a nested list structure where the jth element of the inner list of the ith element of the outer
+list is a comparison of the ith element of the first list to the jth element of the second list. 
+Put more simply, the result is a nested list where the identifiers of an element describe which inputs were
+matched to produce the comparison output found at that element. 
+
+.. image:: ${static_path}/images/tools/collection_ops/nested_crossproduct_output.png
+  :alt: The Cartesian Product of Two Collections
+  :width: 500
+
+============================================
+What this tool does (technical details)
+============================================
+
+This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a``
+has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
+has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
+produces a pair of output nested lists (specifically of the ``list:list`` collection type) where
+the outer list is of length ``n`` and each inner list has a length of ``m`` (a ``n X m`` nested list). The jth element
+inside the outer list's ith element is a pseudo copy of the ith dataset of ``inputa``. One
+way to think about the output nested lists is as matrices. Here is a diagram of the first output
+showing the element identifiers of the outer and inner lists along with the what dataset is being
+"copied" into this new collection.
+
+.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_1.png
+  :alt: Nested Cross Product First Output
+  :width: 500
+
+The second output is a nested list of psuedo copies of the elements of ``input_b`` instead of 
+``input_a``. In particular the outer list is again of length ``n`` and each inner list is again
+of lenth ``m`` but this time the jth element inside the outer list's ith element is a pseudo copy
+of the jth dataset of ``inputb``. Here is the matrix of these outputs.
+
+.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_2.png
+  :alt: Nested Cross Product Second Output
+  :width: 500
+
+These nested list structures might appear to be a little odd, but they have the very useful property
+that if you match up corresponding elements of the nested lists the result is each combination of
+elements in ``input_a`` and ``input_b`` are matched up once. The following diagram describes these matching
+datasets.
+
+.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_matching.png
+  :alt: Matching Inputs
+  :width: 500
+
+Running a tool that compares two datasets with these two nested lists produces a new nested list
+as described above. The following diagram shows the structure of this output and how the element
+identifiers are preserved and indicate what comparison was performed.
+
+.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_output.png
+  :alt: Matching Inputs
+  :width: 500
 
 ----
 

diff --git a/lib/galaxy/tools/model_operation_macros.xml b/lib/galaxy/tools/model_operation_macros.xml
@@ -4,6 +4,38 @@
             class="ModelOperationToolAction"/>
     </xml>
     <token name="@QUOTA_USAGE_NOTE@">This tool will create new history datasets copied from your input collections but your quota usage will not increase.</token>
+    <token name="@CROSS_PRODUCT_INTRO@"><![CDATA[
+This tool organizes two dataset lists so that Galaxy's normal collection processing produces
+an all-vs-all style analyses of the initial inputs when applied to the outputs of this tool.
+
+While a description of what it does standalone is technical and math heavy, how
+it works within an ad-hoc analysis or workflow can be quite straight forward and hopefully is easier
+to understand. For this reason, the next section describes how to use this tool in context and
+the technical details follow after that. Hopefully, the "how it works" details aren't nessecary to
+understand the "how to use it" details of this tool - at least for simple things.
+]]>
+</token>
+    <token name="@GALAXY_DOT_PRODUCT_SEMANTICS@"><![CDATA[
+
+This tool can be used in and out of workflows, but workflows will be used to illustrate the ordering of
+tools and connections between them. Imagine a tool that compares two individual datasets and how
+that might be connected to list inputs in a workflow. This simiple case is shown below:
+
+.. image:: ${static_path}/images/tools/collection_ops/dot_product.png
+  :alt: The Dot Product of Two Collections
+  :width: 500
+
+In this configuration - the two datasets will be matched and compared element-wise. So the first dataset
+of "Input List 1" will be compared to the first dataset in "Input List 2" and the resulting
+dataset will be the first dataset in the output list generated using this comparison tool. In this configuration
+the lists need to have the same number of elements and ideally matching element identifiers.
+
+This matching up of elements is a very natural way to "map" an operation (or in Galaxy parlance, a tool)
+over two lists. However, sometimes the desire is to compare each element of the first list to each element of the
+second list. This tool enables that.
+
+]]></token>
+
     <xml name="annotate_as_aggregation_operation">
         <edam_operations>
             <edam_operation>operation_3436</edam_operation> <!-- DataHandling -> Aggregation -->

diff --git a/static/images/tools/collection_ops/dot_product.png b/static/images/tools/collection_ops/dot_product.png
diff --git a/static/images/tools/collection_ops/flat_cross_product_downstream.png b/static/images/tools/collection_ops/flat_cross_product_downstream.png
diff --git a/static/images/tools/collection_ops/flat_cross_product_matched.png b/static/images/tools/collection_ops/flat_cross_product_matched.png
diff --git a/static/images/tools/collection_ops/flat_cross_product_outputs.png b/static/images/tools/collection_ops/flat_cross_product_outputs.png
diff --git a/static/images/tools/collection_ops/flat_crossproduct_output.png b/static/images/tools/collection_ops/flat_crossproduct_output.png
diff --git a/static/images/tools/collection_ops/flat_crossproduct_separator.png b/static/images/tools/collection_ops/flat_crossproduct_separator.png
diff --git a/static/images/tools/collection_ops/nested_cross_product_matching.png b/static/images/tools/collection_ops/nested_cross_product_matching.png
diff --git a/static/images/tools/collection_ops/nested_cross_product_out_1.png b/static/images/tools/collection_ops/nested_cross_product_out_1.png
diff --git a/static/images/tools/collection_ops/nested_cross_product_out_2.png b/static/images/tools/collection_ops/nested_cross_product_out_2.png
diff --git a/static/images/tools/collection_ops/nested_cross_product_output.png b/static/images/tools/collection_ops/nested_cross_product_output.png
diff --git a/static/images/tools/collection_ops/nested_crossproduct_output.png b/static/images/tools/collection_ops/nested_crossproduct_output.png