diff --git a/lib/galaxy/tools/cross_product_nested.xml b/lib/galaxy/tools/cross_product_nested.xml index b4ba4d596de5..2b29decd1606 100644 --- a/lib/galaxy/tools/cross_product_nested.xml +++ b/lib/galaxy/tools/cross_product_nested.xml @@ -76,12 +76,90 @@ Synopsis ======== +This tool performs something like a Cartesian Product or Cross Join of two collections +and is mainly used to setup an all-vs-all style analysis across two collections of datasets. +While a description of what it does standalone will seem a bit technical and math heavy, how +it works within an ad hoc analysis or workflow can be quite straight forward and hopefully easier +to understand. For this reason, the next section describes how to use this tool in context and +the technical details follow after that but aren't nessecary to understand how to use this tool +in simple ways. -=========== -Description -=========== +==================== +How to use this tool +==================== +This tool can be used in and out of workflows, but lets use a workflow to illustrate the ordering of +tools. Imagine you have a tool that compares two individual datasets and you run this tool with two +lists of datasets. This simiple case is shown below: + +.. image:: ${static_path}/images/tools/collection_ops/dot_product.png + :alt: The Dot Product of Two Collections + :width: 500 + +In this configuration - the two datasets will be matched and compared element-wise. So the first dataset +of "Input List 1" will be compared to the first dataset in "Input List 2" and the resulting +dataset will be the first dataset in an output list. In this configuration the lists need to have +the same number of elements and ideally matching element identifiers. + +This is a very natural way to "map" an operation (or in Galaxy parlance, a tool) over two lists. +Sometimes however the desire is to compare each element of the first list to each element of the +second list. This tool enables that. Running input lists through this tool produces new list +structures (described in detail below) that when using the same natural "map" over semantics described +above produce every combination of the elements of the two lists compared against each other. +Running a tool with these two outputs instead of the inital two input produces a nested list +structure where the jth element of the inner list of the ith element of the outer list is a +comparison of the ith element of the first list to the jth element of the second list. +Put more simply, the result is a nested list where the identifiers of an element describe which inputs were +matched to produce the comparison output found at that element. + +.. image:: ${static_path}/images/tools/collection_ops/nested_crossproduct_output.png + :alt: The Cartesian Product of Two Collections + :width: 500 + +============================================ +What this tool does (the technical details) +============================================ + +This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a`` +has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b`` +has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool +produces a pair of output nested lists (``list:list``) with where the outer list is of length ``n`` +and each inner list has a length of ``m`` (a ``n X m`` nested list). The jth element +inside the outer list's ith element is a pseudo copy of the ith dataset of ``inputa``. One +way to think about the output nested lists is as matrices. Here is a diagram of the first output +showing the element identifiers of the outer and inner lists along with the what dataset is being +"copied" into this new collection. + +.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_1.png + :alt: Nested Cross Product First Output + :width: 500 + +The second output is a nested list of psuedo copies of the elements of ``input_b`` instead of +``input_a``. In particular the outer list is again of length ``n`` and each inner list is again +of lenth ``m`` but this time the jth element inside the outer list's ith element is a pseudo copy +of the jth dataset of ``inputb``. Here is the matrix of these outputs. + +.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_2.png + :alt: Nested Cross Product Second Output + :width: 500 + +These nested list structures might appear to be a little odd, but they have the very useful property +that if you match up corresponding elements of the nested lists the result is each combination of +elements in inputa and inputb are matched up once. The following diagram describes these matching +datasets. + +.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_matching.png + :alt: Matching Inputs + :width: 500 + +Running a tool that compares two datasets with these two nested lists produces a new nested list +as described above. The following diagram shows the structure of this output and how the element +identifiers are preserved and indicate what comparison was performed. + +.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_output.png + :alt: Matching Inputs + :width: 500 ---- diff --git a/static/images/tools/collection_ops/dot_product.png b/static/images/tools/collection_ops/dot_product.png new file mode 100644 index 000000000000..d5af4cc4df1b Binary files /dev/null and b/static/images/tools/collection_ops/dot_product.png differ diff --git a/static/images/tools/collection_ops/flat_crossproduct_output.png b/static/images/tools/collection_ops/flat_crossproduct_output.png new file mode 100644 index 000000000000..ba32e2c5a9d8 Binary files /dev/null and b/static/images/tools/collection_ops/flat_crossproduct_output.png differ diff --git a/static/images/tools/collection_ops/flat_crossproduct_separator.png b/static/images/tools/collection_ops/flat_crossproduct_separator.png new file mode 100644 index 000000000000..d843acebe571 Binary files /dev/null and b/static/images/tools/collection_ops/flat_crossproduct_separator.png differ diff --git a/static/images/tools/collection_ops/nested_cross_product_matching.png b/static/images/tools/collection_ops/nested_cross_product_matching.png new file mode 100644 index 000000000000..dd58e1a484eb Binary files /dev/null and b/static/images/tools/collection_ops/nested_cross_product_matching.png differ diff --git a/static/images/tools/collection_ops/nested_cross_product_out_1.png b/static/images/tools/collection_ops/nested_cross_product_out_1.png new file mode 100644 index 000000000000..787378bfb3d8 Binary files /dev/null and b/static/images/tools/collection_ops/nested_cross_product_out_1.png differ diff --git a/static/images/tools/collection_ops/nested_cross_product_out_2.png b/static/images/tools/collection_ops/nested_cross_product_out_2.png new file mode 100644 index 000000000000..b355ab6704f3 Binary files /dev/null and b/static/images/tools/collection_ops/nested_cross_product_out_2.png differ diff --git a/static/images/tools/collection_ops/nested_cross_product_output.png b/static/images/tools/collection_ops/nested_cross_product_output.png new file mode 100644 index 000000000000..9dc492ee9a93 Binary files /dev/null and b/static/images/tools/collection_ops/nested_cross_product_output.png differ diff --git a/static/images/tools/collection_ops/nested_crossproduct_output.png b/static/images/tools/collection_ops/nested_crossproduct_output.png new file mode 100644 index 000000000000..8ce2a14128c8 Binary files /dev/null and b/static/images/tools/collection_ops/nested_crossproduct_output.png differ