diff --git a/docs/analysis-tools/abba.md b/docs/analysis-tools/abba.md new file mode 100644 index 0000000..a690794 --- /dev/null +++ b/docs/analysis-tools/abba.md @@ -0,0 +1,20 @@ + +## ABBA + +Given a set of interesting genes, do other genes have similar relationships to known sets of genes? For example, given a set of genes known to be related to drug abuse, what other genes share similar expression patterns in drug abuse gene sets? By answering this question, it becomes possible to elucidate under-studied or obfuscated genes that may play a role in complex phenotypes. + +We have developed a new GeneWeaver tool to address this question, which we call __Anchored Biclique of Biomolecular Associations (ABBA)__. This tool takes advantage of the large number of collected data and cross-species integration to find new genes for investigation. + +The search begins with a user-provided list of genes of interest, such as highly-studied genes with known pathways and relationships. The database then finds any gene sets that contain at least N of the genes in the provided list. From the resulting list of gene sets, ABBA then isolates any genes that occur in at least M GeneSets but not in the initial list. These resulting genes share similar gene set overlap with the original input set, but may not have been previously considered in relation to the gene set of interest. + +!["ABBA applied to a set of 4 genes of interest"](../assets/images/abba.png) + +In the above figure, the lighter nodes indicate less overlap. Using N=2 produces a collection of 37 GeneSets as of 7 July 2010. For brevity, only the top 5 results are shown above. With M=15, the following table lists genes in the result having similar relationships to the input set. + + +![](../assets/images/abba_2.png) + + +Without reasonable thresholds, the results quickly become overwhelming. As of this writing, a simple set of 4 genes of interest results in 555 GeneSets and over 38,000 genes in the candidate list. Increasing the input set to 7 genes of interest results in 983 GeneSets and almost 40,000 genes. Simply requiring gene sets to contain at least 3 genes significantly reduces the search space to 11 and 37 GeneSets, respectively. + +![](../assets/images/abba_3.png) diff --git a/docs/analysis-tools/boolean-algebra.md b/docs/analysis-tools/boolean-algebra.md new file mode 100644 index 0000000..cad1584 --- /dev/null +++ b/docs/analysis-tools/boolean-algebra.md @@ -0,0 +1,91 @@ +## Boolean Algebra + +The Boolean Algebra Tool performs basic set operations on at least two Gene Sets. +Results are displayed as lists of genes belonging to one of the three different types of +set operations: Union, Intersect, and Symmetric Difference. Furthermore, results allow +users to quickly determine new relationships between Gene Sets and create a new Gene Set +based on set-derived findings. + +### Using the Boolean Algebra Tool + +Access the Boolean Algebra Tool through +the [Analyze Genesets](index.md#analyze-gene-sets-tab) tab, located in the left-hand +column and distinguished by the Venn diagram icon. + +![](../assets/images/boolean_algebra_options.png) + +To generate Boolean Algebra results, select either a Project of two or more Gene Sets or +at least two individual Gene Sets from a project. Next, select the appropriate Boolean +Algebra function. These functions are based on basic _Set Algebra_: **Union**, +**Intersection**, **Symmetric Difference**. + +* **Union**: This tool generates a set of all genes located in all sets. It removes + duplicates by default. The results will display what homology mapping was used to + generate a gene entry. + +This result shows the union of three Gene Sets, two mouse and one human. + +![](../assets/images/boolean_algebra_union.png) + +* **Intersection**: This option will cause the Boolean tool to return all genes in + common with the selected Gene Set inputs. It has an additional option (_"Genes must + intersect in at least X"_) that specifies the minimal amount of overlaps required to + return a result. If a minimal overlap is set to _3_, for example, only Gene Sets that + intersect with 3 or more genes will be evaluated, and only the intersecting genes will + be returned. In addition, results are divided into separate groups based on the number + of genes in their intersections. + +These three Gene Sets have 4 genes in common. All of them are homologs between mouse and +human. + +![](../assets/images/boolean_algebra_intersect.png) + +Changing the overlap to 2 created two sets of results, those in all 3 Gene Sets and +those in only 2 of the Gene Sets. + +![](../assets/images/boolean_algebra_intersect3.png) + +* **Symmetric Difference**: This tool will create a set of genes that are unique to the + Gene Sets selected as input. It effectively finds the Union of all Gene Sets minus the + intersection of those Gene Sets. + +In this example, there is a result set of unique genes for each input Gene Set. + +![](../assets/images/boolean_algebra_except.png) + +### Managing Results + +A table located just below the circle overlap diagram and above the results is intended +to display a broad survey of genes included in the input Gene Sets, categorized by +species. It lists: _Genes Specific to Species_, _Genes In Common with at Least One Other +Species_, and _Total Number of Genes_. These values are based on the total number of +genes in the input sets, and may not specifically represent results. The table is +intended to help aid in the selection of which species to map the results in cases where +new Gene Sets are created. + +![](../assets/images/boolean_algebra_table.png) + +Genes returned by the Boolean Algebra tool can be added to new Gene Sets. To do this, +click on the **Create New Gene Set From Results** button for the group you want. + +Since results can contain genes from a mixed set of species, a species must be selected +for mapping the genes in the new Gene Set. + +![](../assets/images/boolean_algebra_select_species.png) + +The standard Upload GeneSet page will open. The genes will be listed in the gene +information section. If no species is selected, no genes will be listed. You can now +edit any of the fields to change the Gene Set name, description, etc. Follow +the [Upload GeneSet](#uploading-gene-sets) procedure. It is also important to note that +very large gene lists may take a few moments to load, during which time the user may +experience a dimmed 'Loading' screen. + +### Circle Overlap Diagram + +If the user selects 10 or fewer Gene Sets, a gene overlap diagram will appear near the +top of the results page. The **Circle Overlap** representation is an approximation of +Euler fractional overlaps. It represents how the input genesets relate to each other. It +uses the same homology mapping as the Boolean Algebra tool to render the approximate +fractional overlap of the genes shared between each set. + +![](../assets/images/bool_image.png) diff --git a/docs/analysis-tools/clustering.md b/docs/analysis-tools/clustering.md new file mode 100644 index 0000000..39e0390 --- /dev/null +++ b/docs/analysis-tools/clustering.md @@ -0,0 +1,239 @@ +**Clustering** +============== + + +Why Use the Clustering Tool +---------- + +Clustering is one of the most powerful tools in bioinformatics, where classifications are too strict for data distinction, clustering helps give the user an evaluation that is not so distinct. + + +### Using the Clustering Tool + +1. Select the gene sets from your list of projects that you would like + to analyze. + - You need a minimum of 3 gene sets in total to run the tool. + +2. Select if homology is to be included or excluded. + - Homology is included by default. + +3. Select the method of clustering. + - Average is the default method of clustering. + - There are five methods of clustering. They are listed in the + methods section. + +### Understanding your Results + +#### Visualization Types + +There are two methods for visualizing your clustering results. + +**Force Directed Graph** + +![](../assets/images/Forced-directed-graph.png "fig:Forced-directed-graph.png") + +- Tree representation of each cluster. +- Clear depiction of hierarchy. +- The most opaque node of a tree represents the clusters root. + +- Each node is classified as one of the following: + - **Cluster** - Grouping of gene sets + - The opacity of the nodes is based on the Jaccard Similarity of its children. The more similar the gene sets, the darker the cluster. + - On Hover: Reveals Jaccard Similarity of its child nodes. Reveals set notation of the containing hierarchy. + + ![](../assets/images/Cluster-onHover.png "fig:Cluster-onHover.png") + +* On Click: Collapses (absorbs its children). + + + ![](../assets/images/Cluster-onClick.png "fig:Cluster-onClick.png") + + - **Gene Set** - A set of genes + - Colored based on the species contained in the gene set study. + - Sized based on the relative size of the gene set. + - On Hover: Reveals abbreviated gene set information. + - On Click: Reveals and cycles through genes in groups of ten. + - On Double Click: Opens a new page containing extensive gene set information. + + - **Gene** + - On Hover: Reveals the name of the gene. + + - **Edges** + - Connects nodes to its children. + - The opacity of edges leading from cluster nodes is based on the cluster nodes Jaccard Similarity, following the same scale as above. + +**Partitioned Sunburst** + +![](../assets/images/Partitioned-sunburst.png "fig:Partitioned-sunburst.png") + +- Top-down view of each tree. +- Center represents the root. +- Partitioned sub-circles represent clusters, gene set or gene. + +- **Partition** + - Partitions are the equivalent to nodes in a tree + - Each parition is classified as one of the following: + - **Cluster** - Grouping of gene sets + - On Hover: Reveals Jaccard Similarity of its child + partition and highlights all nodes within the cluster. + - On Right Click: Opens a new "View GeneSet Overlap" page + using all gene sets in the cluster as input. + - **Gene Set** - A set of genes + - Colored based on the species contained in the gene + set study. + - Drawn arc sizes are based on the relative size of the + gene set. + + - On Right Click: Opens a new "View GeneSet Details" page for the + given gene set. +- **Rings** + - Each Ring represents a level in the tree. + - The outer most levels are gene sets. + - The levels leading up to a gene set represents the hierarchy of + the cluster. + + +Clustering Methods +------------------ + +Listed below are the six different methods that the user can choose from +while running the tool. The first five are different clustering methods +that will run on the selected genesets and display a force directed tree +and a partitioned sunburst based on the clustered genesets. + +All five of the given clustering methods are agglomerative hierarchical +clustering methods that start with each geneset belonging to its own +cluster. They then combine the clusters at each iteration based off of a +described linkage method that determines how the distance between two +clusters is defined. The clusters are combined until there are no more +clusters that are similar to each other (the distance between them is +too large). + +### McQuitty + +The McQuitty clustering method uses a linkage method where distance +depends on the combination of clusters instead of the individual +genesets within each cluster. When two clusters are joined together, the +distance of the new cluster to any other cluster is calculated as the +average distance between the two clusters that are being joined and the +other cluster. For example, if clusters 2 and 4 have the greatest +similarity and we are going to combine them into a new cluster called +2+4, then the distance from 2+4 to 1 is the average of the distances +from 2 to 1 and 4 to 1. + +- **Algorithm** + - Each gene set is initialized as its own cluster. + - The initial similarity between each cluster is the Jaccard + Similarity of the two genesets. + - While we still have similar clusters: + - Clusters with highest similarity are clustered together. + - Calculates the similarity between the new cluster and all + the rest based on the McQuitty linkage method +- **Time Complexity** + - O(n^2^ log n) + - This method is the most time efficient. + +### Ward + +The Ward clustering method uses a linkage method where the distance +between two clusters is based off of the Jaccard Similarity score +between them. When two clusters are joined together, the new cluster +will take the union of the genesets in the two clusters that are being +joined and set that as its geneset. It will then calculate the new +geneset's similarity score against all the other cluster's genesets and +that will be set as the distance between the new cluster and all the +other clusters. + +- **Algorithm** + - Each gene set is initialized as its own cluster + - The initial distance between clusters is the Jaccard Similarity + score between each of the cluster's genesets + - While we have clusters that are similar to each other: + - Clusters with highest similarity are clustered together. + - The new cluster contains a geneset which is the union of its + children's genesets + - Recalculates the Jaccard Similarity score between the new + cluster and all the other clusters +- **Time Complexity** + - O(n^3^) + +### Complete + +The Complete clustering method uses a linkage method where the distance +between two clusters is the lowest similarity score between any of the +genesets in one cluster compared to any of the genesets in the other +cluster. When two clusters are combined, the genesets within each of the +clusters are put into a new cluster. No new calculations are needed at +each iteration because we are simply reusing the similarity scores of +all the genesets compared to each other. + +- **Algorithm** + - Each gene set is initialized as its own cluster. + - The similarity scores of all the genesets compared to each + other are saved in a matrix + - While we still have clusters that are similar: + - Determine which two clusters to join: + - The distance between two clusters is the lowest + similarity score between a geneset in one cluster and a + geneset in the other cluster + - The highest of these distances determines which two + clusters will be joined + - Combines the two clusters to create a new cluster that has + all the genesets that were present in the two children + clusters +- **Time Complexity** + - O(n^3^) + +### Average + +The Average clustering method uses a linkage method where the distance +between two clusters is the average similarity score between all of the +genesets in one cluster compared to all of the genesets in the other +cluster. When two clusters are combined, the genesets within each of the +clusters are put into a new cluster. No new calculations are needed at +each iteration because we are simply reusing the similarity scores of +all the genesets compared to each other. + +- **Algorithm** + - Each gene set is initialized as its own cluster. + - The similarity scores of all the genesets compared to each + other are saved in a matrix + - While we still have clusters that are similar: + - Determine which two clusters to join: + - The distance between two clusters is the average + similarity score between every geneset in one cluster + and every geneset in the other cluster + - The highest of these distances determines which two + clusters will be joined + - Combines the two clusters to create a new cluster that has + all the genesets that were present in the two children + clusters +- **Time Complexity** + - O(n^3^) + +### Single + +The Single clustering method uses a linkage method where the distance +between two clusters is the highest similarity score between any of the +genesets in one cluster compared to any of the genesets in the other +cluster. When two clusters are combined, the genesets within each of the +clusters are put into a new cluster. No new calculations are needed at +each iteration because we are simply reusing the similarity scores of +all the genesets compared to each other. + +- **Algorithm** + - Each gene set is initialized as its own cluster. + - The similarity scores of all the genesets compared to each + other are saved in a matrix + - While we still have clusters that are similar: + - Determine which two clusters to join: + - The distance between two clusters is the highest + similarity score between any geneset in one cluster and + any geneset in the other cluster + - The highest of these distances determines which two + clusters will be joined + - Combines the two clusters to create a new cluster that has + all the genesets that were present in the two children + clusters +- **Time Complexity** + - O(n^3^) diff --git a/docs/analysis-tools/dbscan.md b/docs/analysis-tools/dbscan.md new file mode 100644 index 0000000..ed8b3db --- /dev/null +++ b/docs/analysis-tools/dbscan.md @@ -0,0 +1,307 @@ +**DBSCAN Gene Clustering** +========================== + +What is DBSCAN? +--------------- + +DBSCAN (Density-Based Spatial Clustering of Application with Noise) is a clustering algorithm that groups genes into clusters based on how closely related the genes are. + +### Why Use the DBSCAN Tool? + +In general, clustering is used to find patterns or outliers within data sets. In this implementation of DBSCAN, genes in the same cluster would be considered similar, while genes in different clusters would be less similar. An explanation of DBSCAN can be found [here](https://en.wikipedia.org/wiki/DBSCAN). Within Geneweaver, this tool can be used to infer relationships between genes. For example, if clusters with similar genes continue to appear in tests across multiple data sets, one could say that these genes are closely related. + +DBSCAN Parameters +----------------- + +DBSCAN takes in 2 parameters, epsilon and minPoints. + +### The Epsilon Parameter + +Epsilon determines how close the genes need to be in order to be +considered in the same cluster. For example, an epsilon of 1 means that +genes need to share at least 1 gene set. Another way of describing +epsilon would be the "radius of the neighborhood". A larger epsilon will +have a farther reach when finding clusters. + +### The minPoints Parameter + +The minPoints parameter determines the minimum number of points required +to form a cluster. A cluster can have more than the minPoints number of +genes, but cannot be less than minPoints. If a cluster has less than +minPoints number of genes, it is considered noise. + +The DBSCAN Algorithm +-------------------- + +Before the DBSCAN algorithm executes, it must determine how closely +related each gene is to the other genes. A bipartite graph is used to +show how the genes connect to each gene set. First, all closest paths +between genes are found. Following that, the DBSCAN algorithm is run. +You can find an example of DBSCAN [here](#dbscan-example). + +### Run Times of DBSCAN + +On average, the worst-case time complexity of DBSCAN is O(n^2^). +However, due to the sheer variability of data sets and epsilon and +minPoints combinations, it is difficult to accurately predict the run +time of this implementation. There are some factors that will typically +increase the run time. These include: + +- Number of Genes: If more genes are tested, the run time is longer +- Epsilon Value: A larger epsilon will typically give a longer run + time +- The size of gene sets: Gene sets with more genes in them will take + longer to explore +- The density of genes: If the data set is denser (more connections), + the run time is longer + +>Note: Even if no clusters are found, the algorithm may still take time +to execute. + +Below is a graph that shows the run times of the algorithm. The red line +shows the run time if all genes are in the same gene set. The blue line +shows the genes divided into 10 gene sets, with no overlap. The green +line is similar to the blue line, but here the gene sets share one gene +in common with one other gene set. This results in one giant cluster +with all of the genes. + +>Note: Since the blue line and green line overlap, you may not be + able to see the blue line. + +![](../assets/images/Run_Times_Graph.jpg "Run_Times_Graph.jpg") + +Below is a table that estimates the run time of the red, blue, and green +cases based on number of genes. Note that run times will change based on +density of the gene sets and epsilon. + + Number of Genes|1 Gene Set|10 Gene Sets, No Overlap|10 Gene Sets, Overlap + :-------------:|:--------:|:----------------------:|:--------------------: + 100|3|3|3 + 200|3|3|3 + 500|5|3|3 + 1,000|10|3|3 + 1,500|12|3|3 + 2,000|15|3|3 + 2,500|28|5|5 + 3,000|63|8|8 + 3,500|110|12|12 + 4,000|160|17|18 + 4,500|230|24|25 + 5,000|306|32|33 + 6,000|487|50|51 + 7,000|708|72|75 + 8,000|969|98|100 + 9,000|1270|129|131 + 10,000|1612|163|165 + + >Approximate DBSCAN Run Times with Epsilon = 1 and Min Points = 1 (in + seconds) + +Visualization +------------- + +Once DBSCAN is completed, results can be visualized in two ways. +However, there is a possibility that visualization may not occur. If a +data set is too large, the results will not be visualized and a message +will be displayed. + +>Note: Due to the rendering of the Cluster / Gene Table, run times may +appear longer than estimated in [here](#run-times-of-dbscan). + +### Circles + +The default visualization on the tool is circle packing. This represents +the clusters and the genes within them. The outermost circle is the +entire data set. The darker blue circles within represent the different +clusters. The circles within the clusters represent the genes that +belong to the cluster. The color of each gene denotes the species. + +To see more information about the cluster, you can click on the cluster. +This will zoom in on the cluster and display gene IDs. Clicking on a +gene ID will redirect to a search for that gene within the GeneWeaver +database. + +Below is an example of the circle packing visualization with zoom +functionality. + +![](../assets/images/Circle_Visualization.png "Circle Visualization.png") + +### Wires + +The other visualization is a wire representation. This shows the +connections between all genes in the same gene set. The color of each +gene shows which cluster the gene is in. If a gene is grey, it is +considered noise. Mousing over a circle will highlight it and show the +gene ID. By clicking and holding a gene, you can drag the gene +around the screen. + +>Note: This visualization will only be drawn with small data sets due to +the complexity of drawing all lines between genes. + +Below is an example of the wires visualization. + +![](../assets/images/Screen_Shot_2016-12-01_at_8.30.43_PM.png) + +### Cluster / Gene Table + +Below the visualizations is a table. This table is split up into +clusters, which contains all the genes within that specific cluster. +Information about each gene can be seen here as well. This table is +similar to the one on the **GeneSet Details** page. + +![](../assets/images/Table_Visualization.png) + +If the data set becomes sufficiently large, a minimized table will be +shown on screen. An example of the minimized table is below. + +![](../assets/images/Large_Table_Visualization.png) + +DBSCAN Example +-------------- + +Below is an example of the DBSCAN algorithm. For this example, epsilon +is set to 1 and min-points is set to 4. Figure 1 shows the gene-to-gene +set bipartite graph. + +![](../assets/images/DBSCAN_Gene_To_Gene_Set.png "fig:DBSCAN_Gene_To_Gene_Set.png") + +_Figure 1_: The gene-to-gene set bipartite graph + +### Finding Shortest Paths Between Genes + +Starting at "Test Set 0" Prp31, Arr1, baz, and car are all in the same +gene set. This means that when building the gene-to-gene graph, all of +those genes will be connected to each other. "Test Set 1" shows that +Arr1 and veli are connected. "Test Set 2"has veli and Arr2 connected. +"Test Set 3" has Arr2 connected to CalX. Finally, "Test Set 4" has CalX, +CdsA, and Cerk connected. Now that the connections between genes are +determined, a map can be drawn showing these connections (Figure 2). + +![](../assets/images/DBSCAN_Gene_To_Gene.png "fig:DBSCAN_Gene_To_Gene.png") + +_Figure 2_: The gene-to-gene graph denoting shortest paths + +Using this graph, the shortest path from a gene to any other gene can be +determined. For example, the distance between Arr1 and baz is 1. The +distance between Prp31 and CalX is 4. This is important when applying +epsilon to the algorithm. + +### Running the DBSCAN Algorithm + +This is the pseudocode for the algorithm. + +![DBSCAN +Pseudocode](../assets/images/DBSCAN_Algorithm_Pseudocode.png "DBSCAN Pseudocode") + +Starting in the DBSCAN function, the cluster is first initialized to 0. +Next, each point is visited only once. For this example, baz will be the +first gene visited. baz will first be marked as visited, then the +neighbors of baz will be found by regionQuery. The regionQuery function +will return all points within radius epsilon, including the point +itself. Calling regionQuery on baz with epsilon will return all genes +that are one away from baz. In this example baz, car, Prp31, and Arr1 +are returned and listed as baz's neighbors. + +![](../assets/images/Screen_Shot_2016-11-29_at_6.17.30_PM.png) + +The list of \[baz, car, Prp31, Arr1\] are returned. Now the amount of +items in the list is checked with the minPoints parameter. If it is +greater than or equal to minPoints, a cluster is formed. Otherwise, the +point is labelled as noise. In this example, baz has 4 neighbors, which +is equal to the number of points. The "C = next cluster" statement means +that C is a valid cluster. Next, the expandCluster function is called. + +The expandCluster will continue to expand the cluster until the edge of +the cluster is reached. The edge of a cluster is reached when a point +has a list of neighbors that is less than the number of minPoints. When +entering the expandCluster function, the point P will be added to the +cluster. The cluster is currently \[baz\]. Next, the algorithm runs +through all of the neighbors to see if the cluster can be expanded. The +list of neighbor points is now \[baz, car, Prp31, Arr1\]. First baz is +checked, but because it has already been visited, it is not going to +be checked again. Next, car is checked. Car will then return a list of +all its neighbors, which are \[car, baz, Prp31, Arr1\]. Then that list +is checked against the number of minPoints. Since it is greater than or +equal to minPoints, that list is added to the original list of +neighbors. The original neighbors list of \[baz, car, Prp31, Arr1\] +and the new neighbors list of \[car, baz, Prp31, Arr1\] are added +together. However, the algorithm does not add duplicate genes to the +list. Therefore, nothing is added to the list and the neighbors list is +\[baz, car, Prp31, Arr1\]. Then, the gene is added to the current +cluster if it is not already part of a cluster. car is not a part of any +other cluster so it is added to the current cluster. Now the cluster +contains \[baz, car\]. + +Next, Prp31 is checked. Its neighbors are \[baz, car, Prp31, Arr1\]. +This list is equal to minPoints, but once again, the list of Prp31's +neighbors are already in the list of baz's neighbors. Nothing is +added to new neighbors, and since Prp31 is not a part of any other +cluster, it is added to the current cluster, which is now \[baz, car, +Prp31\]. + +Now, Arr1 is checked. Its neighbors are \[Arr1, baz, car, Prp31, +veli\]. Notice that a new gene appeared in Arr1's neighbors (veli). This +gene is now added to the list of baz's neighbors. Arr1 is added to the +current cluster, so the cluster now holds \[baz, car, Prp31, Arr1\]. Now +there is still one gene left to check in baz's neighbors, which is veli. + +![](../assets/images/Screen_Shot_2016-11-30_at_11.18.18_PM.png "Screen_Shot_2016-11-30_at_11.18.18_PM.png") + +veli is checked and it's neighbors are \[veli, Arr1, Arr2\]. The list is +less than the number of minPoints, which means the cluster cannot be +expanded past veli. + +![](../assets/images/Screen_Shot_2016-11-30_at_11.21.40_PM.png) + +However, veli is still part of the current cluster. The current cluster +is now \[baz, car, Prp31, Arr1, veli\]. Since the list of baz's +neighbors have all been checked, the cluster is finished. + +![](../assets/images/NewSlide24.jpg "NewSlide24.jpg") + +Now that baz has been checked, it is time to check other genes. Next, +car is checked. However, it was already visited when handling baz's +neighbors, so nothing needs to be checked. The same applies for Prp31, +Arr1, and veli. The next gene to check is Arr2. Arr2's neighbors are +\[veli, Arr2, CalX\]. This is less than minPoints, so it is marked as +noise. + +![](../assets/images/NewSlide25.jpg "NewSlide25.jpg") + +However, just because a gene is marked as noise, does not guarantee it +is noise when the algorithm is finished. Later in the algorithm, it can +be added to a cluster. + +![](../assets/images/NewSlide26.jpg "NewSlide26.jpg") + +Next, CalX is checked. It's neighbors are \[CalX, Arr2, CdsA, Cerk\]. +This list is equal to minPoints, so the cluster needs to be expanded. + +![](../assets/images/NewSlide28.jpg "NewSlide28.jpg") + +CalX is checked, but it is already visited, and it is not a part of any +cluster, so it is added to the 2^nd^ cluster. The 2^nd^ cluster +currently holds \[CalX\]. Next, Arr2 is checked, but it was already +visited and marked as noise. However, it is not in any cluster, so it is +added to the 2^nd^ cluster. The 2^nd^ cluster now contains \[CalX, +Arr2\]. Next, CdsA is checked. Its neighbors are \[CdsA, Cerk, CalX\]. +This list is not greater than minPoints so nothing is added. CdsA is not +added to the 2^nd^ cluster because it is not part of the first cluster. +The 2^nd^ cluster is now \[CalX, Arr2, CdsA\]. Finally, Cerk is checked. +Its neighbors are \[CdsA, CalX\]. The list is smaller than minPoints, so +they are not added to Calx's neighbors. Cerk is not a part of any +cluster, so it is added to the 2^nd^ cluster. The 2^nd^ cluster is now +complete. It contains \[CalX, Arr2, CdsA, Cerk\]. + +Now that CalX is checked, CdsA is checked. It was already visited in the +expandCluster function so nothing needs to be done. The same applies for +Cerk. The algorithm is now complete. + +Two clusters were produced: \[baz, car, Prp31, Arr1, Veli\] and \[Arr2, +CalX, CdsA, Cerk\] + +Figure 3 shows the gene-to-gene map visualized in clusters. + +![](../assets/images/NewSlide37.jpg "NewSlide37.jpg") + +_Figure 3_: The result of the DBSCAN clustering diff --git a/docs/analysis-tools/find-variants.md b/docs/analysis-tools/find-variants.md new file mode 100644 index 0000000..eb2c1dc --- /dev/null +++ b/docs/analysis-tools/find-variants.md @@ -0,0 +1,41 @@ +**Find Variants** +====================== + +Why Use the Find Variants Tool +----------------------------------- +The Find Variants tool traverses a graph database representing the relationships of human or mouse variants and genes. The tool starts with a set of human or mouse genes and outputs a list of variants in the other species using orthologous relationships between genes. + +The graph database is built off a fixed and reproducible set of data sourced mostly from Ensembl-104 as well as some other data from JAX, AGR, and GTEx. The data is gathered, processed, and imported into a Neo4J graph database. + +Understanding the Find Variants Tool +----------------------------------------- +If the input to the tool is a list of human genes, the tool would find orthologous genes in mice and variants of those mouse genes either through eQTL relationships or transcript relationships. The tool can also work from mouse genes to human variants. + +![](../assets/images/FindVariants_graph.png) + +_Figure 1_: Cut of the graph database to map the relationships from a set of human genes to mouse variants. + +The tool interacts with the graph database built using information from AGR and Ensembl for obtaining information about genes and their orthologous relationships. + +Using the Find Variants Tool +--------------------------------- +Access the Find Variants Tool through the [Analyze Genesets](index.md#analyze-gene-sets-tab) tab. + +To generate variants, you must first select GeneSets from a project. Projects may be created and updated by uploading GeneSets, searching the GeneWeaver database, or through the use of other tools in the GeneWeaver system. See the documentation for [uploading GeneSets](#uploading-gene-sets), [Search](#searching-geneweaver), or [Manage GeneSets](#gene-set-utilities) to learn more about these functions. To select an entire project or multiple projects for analysis, check the box next to the project name. To select individual GeneSets within a project, click on the **+** beside the project name and check individual GeneSets using the check boxes. Next, click on the Jaccard Similarity icon in the Analysis tools box to the left of the project list. + +You must select at least one GeneSet to be analyzed. All GeneSets selected must be from the same species, either mouse or human. + +Once you have selected GeneSets from a project, select the **Find Variants** icon from the Analysis Tools box, to the left of your GeneSets. + +After the tool has finished, a results table is displayed containing information about the variants found. The results can be searched for keywords using the searchbar and all the results can be downloaded using the 'Download' button which will create a CSV file titled 'FindVariants_*.csv' where the * is a string of unique numbers. + +Options +------- + +### Species +Choosing "Human to Mouse" will look for relationships from human genes to mouse variants. Choosing "Mouse to Human" will look for relationships from mouse genes to human variants. + +### Path +Choosing "eQTL" will only find variants that are related to genes through the eQTL relationship and will also return the tissue name. Choosing "Transcript" will find variants related to genes through the Transcript relationship and will return a Transcript ID but no information about the tissue name. Selecting both will return variants for both options. + +The "Transcript" option will naturally return many more variants than the "eQTL" option and as a result can take much longer to run. diff --git a/docs/analysis-tools/geneset-graph.md b/docs/analysis-tools/geneset-graph.md new file mode 100644 index 0000000..40d5dcf --- /dev/null +++ b/docs/analysis-tools/geneset-graph.md @@ -0,0 +1,52 @@ +**GeneSet Graph** +================= + +Why Use the GeneSet Graph Tool +------------------------------ + +The GeneSet Graph is designed for the user in need of a partitioned display to illustrate just how tied genes are to one another. For example: a user in need of a GeneSet Graph would look for visual references more than chemical references or references by utility. A GeneSet Graph can also help pick apart the most valuable or most occurring genes depending on the user's preference. + +Understanding the GeneSet Graph Tool +------------------------------------ + +The GeneSet Graph Tool presents a partitioned display of genes and GeneSets. Genes are represented by elliptical nodes, and GeneSets are represented by boxes. The least-connected genes are displayed on the left, followed by the GeneSets, then the more-connected genes in increasing order to the right. Genes and GeneSets are connected by colored lines to show what genes are in which GeneSets. In this way, the GeneSet Graph displays the bipartite graph of the genes and GeneSets, but modifies the display of the gene partition to make it easier to visually interpret. + +![](../assets/images/GeneSet_Graph_1.png "fig:GeneSet_Graph_1.png") +_Figure 1_: Least connected genes to the left, GeneSets in the middle, most connected genes on the right. + +### Using the GeneSet Graph Tool + +Access the GeneSet Graph Tool through the [Analyze Genesets](index.md#analyze-gene-sets-tab) tab. + +To generate a GeneSet Graph, you must first select gene sets from a project. Projects may be created and updated by uploading GeneSets, searching the GeneWeaver database, or through the use of other tools in the GeneWeaver system. See the documentation for [uploading GeneSets](#uploading-gene-sets), [Search](#searching-geneweaver), or [Manage GeneSets](#gene-set-utilities) to learn more about these functions. To select an entire project or multiple projects for analysis, check the box next to the project name. To select individual GeneSets within a project, click on the **+** beside the project name and check individual GeneSets using the checkboxes. Next, click on the GeneSet Graph icon in the Analysis tools box to the left of the project list. (For users that want to change options, press the green **+** sign before they start the tool). + +![](../assets/images/GeneSet_Graph_2.png "fig:GeneSet_Graph_2.png") +_Figure 2_: GeneSet Graph Selection Icon. + +The GeneSet Graph can be interactively panned and zoomed with the mouse, and more details of each gene or GeneSet can be viewed by clicking on the individual nodes in the display. In addition to these interactive features, there are also a few options available to optimize the display. + +Clicking on a gene node executes a search for other GeneSets containing the gene of interest or its homologues. Clicking on a GeneSet node reveals full publication and annotation information, including the GeneSet description. + +![](../assets/images/GeneSet_Graph_3.png "fig:GeneSet_Graph_3.png") +_Figure 3_: Selecting GeneSets will navigate users to the GeneSet page; selecting the gene will initiate a search of that gene. + +Options +------- + +### Suppress Disconnected + +When enabled, this option will suppress the display of GeneSets which are not connected to any displayed genes. This helps remove unnecessary information for users that only want relations. This is only relevant when [MinDegree](#mindegree) is greater than 1. + +### Homology + +Include homology to integrate multi-species data. If excluded, data from multiple species will be segregated into distinctly separate graphs. + +![](../assets/images/GeneSet_Graph_4-5.png "fig:GeneSet_Graph_4&5.png") +_Figure 4_: 2 GeneSets each from mouse and rat. + +### MinDegree + +The minimum number of connections for a displayed gene. A value of 2 means that any displayed genes must be found in at least two of the input gene sets. Increasing this value will basically shift the resulting gene display left. Since lower-order overlaps are generally more likely and more numerous than higher-order intersections, this can quickly reduce the number of genes displayed and make the result more manageable. + +![](../assets/images/GeneSet_Graph_6-7.png "fig:GeneSet_Graph_6&7.png") +_Figure 5_ diff --git a/docs/analysis-tools/hisim-graph.md b/docs/analysis-tools/hisim-graph.md new file mode 100644 index 0000000..f5299e1 --- /dev/null +++ b/docs/analysis-tools/hisim-graph.md @@ -0,0 +1,198 @@ +**HiSim Graph** +=============== + +About the HiSim Graph Tool +-------------------------- +The HiSim Graph, short for Hierarchical Similarity Graph, is a tool for grouping +functional genomic datasets based on the genes they contain. _For example_: The user may +want to determine what a set of experiments on alcohol preference have in common, and +what makes various experiments unique from one another. Alternatively, one may wish to +take a large set of studies of related phenomena and identify their shared or distinct +substrates. In this situation one may want to know whether there is a shared biological +basis for addiction and learning, and if so, what the substrate is. The user might also +want to examine studies of a large number of related disorders and determine whether a +more appropriate biologically-based classification can be constructed. + +The HiSim Graph Tool is designed to address these goals; it presents a tree of +hierarchical relationships for a set of input GeneSets. The structure is determined +solely from the gene overlaps of every combination of GeneSets. + +Understanding the Results of the HiSim Graph +-------------------------------------------- + +It's best to use the HiSim Graph Tool with knowledge on what set intersections are: If +GeneSet A contains Gene A, Gene B, and Gene C, and GeneSet B contains Gene A, Gene +B, and Gene D; then the intersection of GeneSet A and GeneSet B will contain Gene A and +Gene B, because an intersection of sets are whatever is contained in all sets +intersected. + +In terms of GeneSets, the smallest intersections (fewest GeneSets, most genes) are +towards the right, and the largest intersections (most GeneSets, fewest genes) are on +the left. When thinking about the genes in all the GeneSets, the roles are reversed ( +smallest number of genes on the left, largest number of genes on the right). + +![](../assets/images/HiSimGraphGeneGenesets.png "fig:HiSimGraphGeneGenesets.png") + +_Figure 1_: Relation of GeneSets to the HiSim Graph + +HiSim Graphs must be interpreted in the context of the input GeneSets. The above example +represents differentially expressed genes in multiple brain regions of alcohol +preferring rats from a single study. The highest intersection represents a gene +differentially expressed in all 5 brain regions. In this case, the highest intersection +represents the highest amount of correspondence between data sets. As you move to the +right, genes become more specific to the brain regions tested. Each solid node has +children and can be collapsed by clicking on it. Leaf nodes are empty and colored by +species, which is identified in a legend at the bottom of the screen. + +![](../assets/images/HiSimGraphComplex.png "fig:HiSimGraphComplex.png") +_Figure 2_: A HiSIm Graph for diverse functions + +If one were to start with multiple alcohol preference measures from different studies, +the top of the HiSim Graph represents the +correspondence between the experiments (such as well-characterized alcohol preference +genes), and as you descend the graph the intersections describe more specific features +shared between experiments (such as stress response or tissue source). + +When starting with more loosely related inputs, interpretation becomes more difficult. +If one started with alcohol preference, nicotine dependence, and traumatic brain injury +data (Figure 2), the top of the HiSim Graph would represent more generic processes such +as neural plasticity in this case. + +Using the HiSim Graph Tool +-------------------------- + +Access the HiSim Graph Tool through the [Analyze Genesets](index.md#analyze-gene-sets-tab) tab. + +To generate a HiSim Graph, you must first select gene sets from a project. Projects may +be created and updated by uploading GeneSets, searching the GeneWeaver database, or +through the use of other tools in the GeneWeaver system. See the documentation +for [uploading GeneSets](#uploading-gene-sets), [Search](#searching-geneweaver), +or [Manage GeneSets](#gene-set-utilities) to learn more about these functions. To select +an entire project or multiple projects for analysis, check the box next to the project +name. To select individual GeneSets within a project, click on the **+** beside the +project name and check individual GeneSets using the check boxes. Next, click on the +HiSim Graph icon in the Analysis tools box to the left of the project list. Select the +options you would like for the tool to run on, and click Run. + +![](../assets/images/HiSimGraph_AnalyzeGeneSets.png "fig:HiSimGraph_AnalyzeGeneSets.png") +_Figure 3_: Selecting gene sets and executing an analysis from the +Analyze GeneSets page + +![](../assets/images/HiSimGraphResultsPage.png "fig:HiSimGraphResultsPage.png") +_Figure 4_: The results page for the HiSim Graph. + +Most genes are connected to two of the input GeneSets. One gene is connected to three of +the input sets. (Inset) + +### The GeneSet Intersection page + +GeneSet intersection data can be downloaded as a csv file for subsequent analyses. The +GeneSets giving rise to each node can be stored in a separate project. + +The HiSim Graph opens and the nodes can be selected to expand the graph. More details of +each intersection can be viewed by clicking on the individual nodes in the tree. A link +at the bottom of the frame allows download of the csv. + +![](../assets/images/HiSimGraphStatsAndSliders.png "fig:HiSimGraphStatsAndSliders.png") +_Figure 5_: These options are available for the HiSim Graph, to change the way nodes +interact with each other. The stats of the graph, as well as shortcuts and the legend +identifying each species in the graph, are also displayed. + +![](../assets/images/HiSimGraphSearchFunction.png "fig:HiSimGraphSearchFunction.png") +_Figure 6_. This shows the search function, which highlights paths +between nodes containing the item searched for, whether it be gene, +geneset, or species. + +Options +------- + +There are a number of options available to optimize the HiSim Graph analyses. You may +access the following options on the Analyze GeneSets page by clicking on the HiSim Graph +Tool. + +### DisableBootstrap + +When the resulting HiSim Graph is unimaginably large, a bootstrapping filter is applied +to reduce the output size. This step removes edges that are weakly supported by the +underlying data, for example, those partitions of GeneSet subgroups that are driven by a +single gene difference between the groups. If you would like the large, unfiltered graph +instead, set this option to True to disable bootstrapping. Be warned this may stretch +the graph's size. + +![](../assets/images/HiSimGraphBootstrapTrue.png "fig:HiSimGraphBootstrapTrue.png") + +_Figure 6_: A HiSim Graph with DisableBootstrap turned on (True). + +![](../assets/images/HiSimGraphBootstrapFalse.png "fig:HiSimGraphBootstrapFalse.png") +_Figure 7_: A HiSim Graph with DisableBootstrap turned off (False). + +### Homology + +Include homology to integrate multi-species data. This is done by using homologene +mappings to relate identifiers across species. If homology is excluded, data from +multiple species will be segregated into separate trees. + +![](../assets/images/HiSimGraphHomology_Excluded.png "fig:HiSimGraphHomology_Excluded.png") + +_Figure 8_: Homology excluded. A separate map is drawn for mouse, no overlap with human +is allowed. + +![](../assets/images/HiSimGraphHomology_Included.png "fig:HiSimGraphHomology_Included.png") + +_Figure 9_: Homology included. GeneSets from mouse and human are +allowed to be mixed and are intertwined as one. + +### MinGenes + +The minimum number of genes for an intersection. The default of 1 means that all +intersections will be displayed. Increasing the value means that intersections with +fewer genes will not be displayed in the output, decreasing noise and displaying more +robust correspondence between GeneSets. This generally has the effect of removing the +topmost nodes. + +![](../assets/images/HiSImGraphMinGenes.png "fig:HiSImGraphMinGenes.png") _Figure 10_: +As shown above, the left tree is with the default MinGenes = 1, the +right tree is with the default MinGenes = 5. + +### Permutations + +The HiSim Graph can ultimately address questions among highly curated data such as how +much dimension reduction does gene overlap provide. For example, one may take a large +set of gene sets associated with mood disorders and ask whether the data are similar +enough to group together, i.e., of all possible subset intersections, how many are +populated, and is this result better than chance? + +The maximum number of permutations to run is set to 0 by default since it can take a +long time to run for large input sets. The genes contained in each GeneSet are permuted +over the union of all genes in the input sets, controlling for the size of each GeneSet. +The permutation tests measure the likelihood of getting a similar tree structure ( +Parsimony) or of getting a similar aggregation of genes in each intersection (Gene +Aggregation). Note that this is a maximum value since the actual results may be fewer +due to the [time limit](#permutation-time-limit). + +**Parsimony** is a simple measure of the percentage of observed intersections out of all +possible intersections. This is mathematically defined as: + +![](../assets/images/Phenome_Map_13.png "fig:Phenome_Map_13.png") + +_Figure 11_: For those that aren't aware of the mathematical implications of parsimony, +think of it as one of the many measures of accuracy for a map. You want more parsimony, +but you can't always get full parsimony. + +**Gene Aggregation** is a measure of the total node/tree probability. Each node is +scored based on the intersection of genes and gene sets. Then the product of these +scores is used to assign an overall tree aggregation probability: + +![](../assets/images/Phenome_Map_14.png "fig:Phenome_Map_14.png") + +_Figure 12_: Aggregation is another measure of accuracy that balances with parsimony. In +this tool, neither are ever fully accurate alone, but together they are more fine-tuned. + +### Permutation Time Limit + +The maximum amount of time to spend doing permutations. For example, +if [Permutations](#permutations) is set to 100,000 and this value is 5 minutes, the +result will either have 100,000 permutations (if they finished within 5 minutes), or +will be truncated to the number of permutations which were able to finish within 5 +minutes. The more time you give to Permutation Time Limit, the more accurate your +results will be. diff --git a/docs/analysis-tools/index.md b/docs/analysis-tools/index.md new file mode 100644 index 0000000..7c8b4cf --- /dev/null +++ b/docs/analysis-tools/index.md @@ -0,0 +1,54 @@ +**Analysis Tools** +================== +GeneWeaver uses a set of analysis tools to operate on genes and gene sets. These tools +evaluate a range of data inputs for the purposes of elucidating hierarchical +relationships among a set of gene sets of interest. They can be used to visualize +bipartite clusters, **[HiSim Graph](hisim-graph.md)** or visualize genes with the more +common intersections, **[GeneSet Graph](geneset-graph.md)**. + +Generation and visualization of a maximal triclique using the intersection of gene sets +with the **Triclique Viewer Tool** can allow users to discover novel relationships +between gene ontology terms. The overlap/similarity of gene sets themselves can be +visualized with **[Jaccard Similarity](jaccard-similarity.md)** plots. These set overlaps +are also available for **[Clustering](clustering.md)**, while component gene intersections +can be found on our **[Gene Intersection Lists](../reference/geneset-utilities.md#gene-intersection-lists)**. +The **[Boolean Algebra](boolean-algebra.md)** tool uses advanced set logic to integrate +multiple genesets. For each tool, GeneWeaver allows users to expand their search beyond +a single species using **[Homology Mapping](../reference/geneset-utilities.md#homology-mapping)**. + +## Analyze Gene Sets Tab + +Use the analyze gene sets tab on the navigation bar to move to the analysis tools. + +![](../assets/images/AnalyzeGeneSetsTab.png) + +A registered user or guest user who has a temporary project will see the Analyze page. +Down the left side are all the tools. Select one or more projects or gene sets and click +on the desired tool. Options will then be displayed below the tool. Select the desired +options and click the Run button. + +![](../assets/images/AnalyzeGeneSetsPage.png) + +A tool can take a long time, depending on the size and complexity of the selected gene +sets. A message will be displayed showing the progress of the tool. You can now navigate +away from this page and later return to the results page. + +![](../assets/images/StartingToolMsg.png) + +## View Results + +The link to the results page is on the analyze gene sets tab. + +![](../assets/images/ResultsManagement.png) + +Your tool has completed once the duration column has a time listed. From this page you +can: + +* Delete a test that you are no longer interested in +* Re-run a test +* View the test results +* Edit the test name +* Use the Search box to display test name matches +* Sort the columns by clicking on the header +* Select up to 100 results to display per page + diff --git a/docs/analysis-tools/jaccard-similarity.md b/docs/analysis-tools/jaccard-similarity.md new file mode 100644 index 0000000..f3dbae8 --- /dev/null +++ b/docs/analysis-tools/jaccard-similarity.md @@ -0,0 +1,87 @@ +**Jaccard Similarity** +====================== + +Why Use the Jaccard Similarity Tool +----------------------------------- +The Jaccard Similarity Tool displays a matrix of Venn diagrams, which can be very useful for quickly finding overlapping GeneSets and evaluating the similarity of results across a collection of experiments. This snapshot may enable you to determine which can be removed or kept for more complex comparison analysis (such as the [HiSim Graph](#hisim-graph)). + +Understanding the Jaccard Similarity Tool +----------------------------------------- +Each Venn Diagram represents the pairwise gene overlap between the two GeneSets depicted for each row and column. Text overlays show the exact gene counts, Jaccard Similarity coefficient and p-value for every pair. The p-value is calculated based on the cumulative probability of obtaining a Jaccard coefficient greater than or equal to the observed value, using formula \[17\] in Real and Vargas, 1996. + +For those less knowledgeable of Jaccard Similarity, it's the ratio of elements in both sets over the elements only found in separate sets. If your matrix produces two separate blue and red circles, rather than a touching Venn Diagram, it means nothing is alike in either of those two GeneSets. + +![Jaccard Similarity Equation - +[source](https://en.wikipedia.org/wiki/Jaccard_index)](../assets/images/Jaccard_Similarity_6.png "Jaccard Similarity Equation - source") + +### Background Processes +The Jaccard Similarity Tool now implements the calculation of the p-value for the Jaccard Similarity score based on an empirical sampling distribution. The distribution is approximated for each unique gene set cardinality (gene set size) pair. Each unique pair of cardinalities are randomly sampled (10,000 samples) from the actual gene list of the geneweaver database and plotted based on the frequency of Jaccard Similarity. The result is a Frequency versus Jaccard Similarity histogram that is used as the distribution for the calculation of the p-value. To calculate the p-value, the tool will simply compare the Jaccard Similarity of the user-selected gene set and grade it based on the curve stored in the database. + +If the Jaccard Similarity does not exist in the curve - that is, if the Similarity is too high to occur *randomly* - the $p$-value is simply zero. If the Jaccard Similarity were to have a value of 1, this would indicate that either one is a subset or both are identical. In this case, we assign a special $p$-value of 1\* since we agree that the probability of a set matching itself (and not some other set which contains other genes) will always occur. + +The implementation of this process is coded and optimized for C++ which runs in the background as your results are loading onto the next page. + +Using the Jaccard Similarity Tool +--------------------------------- +Access the Jaccard Similarity Tool through the [Analyze Genesets](#analyze-gene-sets-tab) tab. + +To generate a Jaccard Similarity Matrix, you must first select gene sets from a project. Projects may be created and updated by uploading Gene Sets, searching the GeneWeaver database, or through the use of other tools in the GeneWeaver system. See the documentation for [uploading GeneSets](#uploading-gene-sets), [Search](#searching-geneweaver), or [Manage GeneSets](#gene-set-utilities) to learn more about these functions. To select an entire project or multiple projects for analysis, check the box next to the project name. To select individual GeneSets within a project, click on the **+** beside the project name and check individual gene sets using the check boxes. Next, click on the Jaccard Similarity icon in the Analysis tools box to the left of the project list. + +![](../assets/images/Jaccard_Similarity_1.png) + +_Figure 1_: Once you have selected GeneSets from a project, select the **Jaccard Similarity** icon from the Analysis Tools box, to the left of your GeneSets. + +Tool results are displayed as a grid of proportional overlaps. The grid, itself, is written in d3 for dynamic user interaction. + +![](../assets/images/Jaccard_Similarity_3.png "fig:Jaccard Similarity_3.png") + +_Figure 3_: Venn diagram for 9 GeneSets. The detail below highlights Column 3, Row 2. + +Jaccard Overlap | | +:--------------------:|:----------------------------------------:| +![](../assets/images/Jaccard_Similarity_5.png)| GS row = pink circle (left)
GS column = green circle (right)
J = Jaccard coefficient
p = $p$-value
Green circles show emphasis genes| + + +The resulting matrix can be zoomed in and out by scrolling the mouse up and down. There is a reset zoom button just in case the user's place is lost in the matrix of venn diagrams. The user can also click and, in addition to these interactive features, the gene sets can be highlighted by row and column by _shift+clicking_ on the intersection of two gene sets. + +![Figure 6: Highlight of row 2, column +3](../assets/images/Jaccard_Similarity_highlight.png "Figure 6: Highlight of row 2, column 3") + +The gene sets can be deselected by alt+clicking on any highlighted gene set. + +### Rerun Option + +The user is able to rerun the tool with different parameters with the rerun tool options. + +![](../assets/images/Jaccard_Similarity_Rerun.png "fig:Jaccard_Similarity_Rerun.png") + +_Figure 7_: Rerun tool option + +This option is expandable/collapsable by simply clicking on the Rerun Tool Options text. + +### Geneset Panel + +The geneset panel shows the Jaccard coefficients and the p-values for every geneset pair for the project the user has chosen. The geneset panel does not recieve the same reduction as the venn diagram as it would be helpful to still view every geneset pairing for convenience. + +The user may also click the checkboxes located next to the geneset names for them to add those selected genesets to a project or to export the genes. + +![](../assets/images/Jaccard_Similarity_2.png "fig:Jaccard Similarity_2.png") + +_Figure 2_: Click *Run* to produce Jaccard Similarity Results for your selected GeneSets. Text overlays show the exact gene counts, Jaccard Similarity coefficient and p-value for every pair. + +Options +------- + +### Homology +Include homology in order to integrate multi-species data. If excluded, homologous genes from different species will not be counted as intersecting. Data from separate species will never show an overlap without homology. + +### PairwiseDeletion +[Pairwise Deletion](http://www-01.ibm.com/support/docview.wss?uid=swg21475199) is used to pick off problematic missing values from data while still aiming to get the remaining values for comparison-based use: + +Values|Obj1|Obj2|Obj3 +:----:|:--:|:--:|:--: +Length|23|N/A|13 +Width|21|22|14 +Depth|N/A|20|11 + +_Figure 7_: In Pairwise Deletion, when comparing length, only Obj1 and Obj3 will be compared. When comparing width, all will be compared, and when comparing depth, only Obj2 and Obj3 will be compared. This prevents missing data from being assigned a default value such as 0 in the system. diff --git a/docs/analysis-tools/mset.md b/docs/analysis-tools/mset.md new file mode 100644 index 0000000..e0cb25c --- /dev/null +++ b/docs/analysis-tools/mset.md @@ -0,0 +1,112 @@ +**MSET** +======== +Modular single-set enrichment tool (MSET): randomization-based test for list over- or under-representation + +About MSET +---------- + +MSET was developed to compare gene lists. From four character lists (gene_list1, gene_list2, background1, background2), it +computes a randomization-based p-value describing the likelihood that the intersect of **gene_list1** and **gene_list2** +is underexpressed or overexpressed relative to randomness alone. + +MSET is based on work from +[Eisinger et al., 2013, "Development of a versatile enrichment analysis tool reveals +associations between the maternal brain and mental health disorders, including autism." BMC Neuroscience](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840590/). + + +Why MSET? +--------- + +MSET permits the selection, or customization, of the genes against which enrichment is performed. This yields the +ability to perform more focused hypothesis testing relative to other enrichment tests. For example, genes specific to +Alzheimer's may be selected to serve as the genes of interest against which enrichment testing is performed. + +How Does MSET Work? +------------------- + +MSET performs enrichment testing using several entities: + +**User Selected**: + +* **Gene Set 1**: The first set of genes to perform MSET on +* **Gene Set 2**: The second set of genes to perform MSET on +* **Number of Trials**: The number of simulated sets to create + +**MSET Computed**: + +* **Gene Set 1 Background**: Determined from Gene ID Type and Species of Gene Set 1 +* **Gene Set 2 Background**: Determined from Gene ID Type and Species of Gene Set 2 +* **The Universe**: The intersection of **Gene Set 1 Background** and **Gene Set 2 Background** +* **Gene Set 1-U**: Genes in **Gene Set 1** that are also contained in the **The Universe** +* **Gene Set 2-U**: Genes in **Gene Set 2** that are also contained in the **The Universe** + +MSET then takes the following steps: + +1. First, the computed inputs are calculated, +2. Then, MSET calculates the v + (said another way, it counts the number of shared genes) +3. For the **Number of Trials**, MSET then samples randomly without replacement from **The Universe** to generate two simulated gene sets of sizes **Gene Set 1-U** and **Gene Set 2-U** respectively, + - For each trial, the intersection of the two simulated gene sets is recorded +4. MSET then calculates the p-value as: + +![](../assets/images/CS_MSET_Formula.png "MSET_Formula.png") + +### An Example + +The example below illustrates MSET with four trials. + +Given the following: + +- Two gene sets, e.g. GS001001 and GS001002 + +- A background for both GS001001 and GS001002 (we can call them B001001 and B001002, respectively) + - Geneweaver determines this automatically by inspecting the gene ID type and species of each gene set + +- The number of trials MSET should perform (in this case, four) + + +1. First, MSET defines _The Universe_ as the intersection of B001001 and B001002. + +![](../assets/images/CS_MSET_Universe.png "MSET_Universe.png") + +2. Any genes in GS001001 or GS001002 that aren't in _The Universe_ are discarded from the analysis. GS001001 and GS001002 now only contain those genes that also exist in _The Universe_. + +![](../assets/images/CS_MSET_Discarded.png "MSET_Discarded.png") + +3. MSET then calculates the cardinality of the intersection of GS001001 and GS001002. Let's assume that GS001001 and GS001002 only share the gene j, then the intersect size is determined to be 1. Here we show simulated GS001001 set in green, and simulated GS001002 sets in pink. Genes which have been selected for either simulated set are circled in their set's color. + +![](../assets/images/CS_MSET_Intersect.png "MSET_Intersect.png") + + +4. MSET then samples randomly without replacement to create four simulated sets each of GS001001 and GS001002. Here we assume that GS001001 has size 2, and GS001002 has size 3. + +![](../assets/images/CS_MSET_Sampling.png "MSET_Sampling.png") + + +5. From the simulated gene sets above, MSET calculates the size of the intersect of each simulated set of GS001001 and GS001002. + +![](../assets/images/CS_MSET_Sample_Size.png "MSET_IntersectSB.png") + + +6. MSET calculates a p-value using this formula: + +![](../assets/images/CS_MSET_Formula.png "MSET_Formula.png") + +We performed four trials, three of which had samples with an intersection at least as large as our observed gene sets. So MSET would return a p-value of 3/4, or 0.75. + + +Using MSET +---------- + +Access the MSET Tool through the [Analyze +Genesets](#analyze-gene-sets-tab) tab. + +To analyze your genes, select two gene sets. You will often have organized these sets into a project relevant to your work. Projects may be created and updated by uploading GeneSets, searching the GeneWeaver database, or through the use of other tools in the GeneWeaver system. See the documentation for [uploading GeneSets](#uploading-gene-sets), [Search](#searching-geneweaver), or [Manage GeneSets](#gene-set-utilities) to learn more about these functions. MSET can only accept two gene sets as input, so you can only use the whole-project select box if your project only contains two sets. + +Next, click on the MSET icon in the Analysis tools box to the left of the project list and specify how many trials you'd like MSET to perform. Once you're ready click the run button. + +![](../assets/images/CS_MSET_Select.png "fig:UsingMSET1.png") + +Once the tool has completed the analysis you will be directed to the results page. There you can view the distribution graph of all simulated intersect sizes, an accurate size comparison graph of the selected sets and the background, and the genes shared by the two input sets. You can download both graphs for later use, and you can also create a new gene set from the genes shared by your two input sets. + +![](../assets/images/CS_MSET_Result.png "fig:UsingMSET2.png") diff --git a/docs/analysis-tools/similar-variant-set.md b/docs/analysis-tools/similar-variant-set.md new file mode 100644 index 0000000..d47815d --- /dev/null +++ b/docs/analysis-tools/similar-variant-set.md @@ -0,0 +1,27 @@ +**Similar Variant Set** +=============== +About the Similar Variant Set Tool +-------------------------- +The Similar Variant Set tool finds variant sets of the same species that are the most similar to the input variant set. The tool uses the Unweighted Pair Group Method with Arithmetic mean (UPGMA) approach in order to do this. To run UPGMA, a distance matrix is first created from the results of the VariantDistanceMatrix tool that are stored in the database. + +The Similar Variant Set tool assumes that the variant sets being tested reference the same reference genome. This allows for the tool to determine similarity based solely on the information provided in the variant sets since any possible variant not present in the variant set can be assumed to be equal to the reference genome. + + +API Tool +-------------------------- +The SimilarVariantSet tool can be run via api at the following url. + +`https://www.geneweaver.org/api/tool/similarvariantset//` + +Options + +* apikey - your geneweaver apikey +* gs_id - the geneset id of your variant set + + +Visualization +-------------------------- +![Scheme](images/hierarchy.png) + +The result of Similar Variant Set tool is visualized in a hierarchy graph. From computing with UPGMA algorithm, the relations between each variant will be constructed and sent to the visualization pipeline. +This hierarchy relation is simply visualized by the color: for each connection, the blue end is parent, and the red end is child. The hierarchy could be traced from each variant’s name as well: we use (A,B) to denote the parent of variant A and variant B. Additionally, user could hover on the links or nodes of interest to see a highlighted version on that part. diff --git a/docs/analysis-tools/variant-distance-matrix.md b/docs/analysis-tools/variant-distance-matrix.md new file mode 100644 index 0000000..fdce888 --- /dev/null +++ b/docs/analysis-tools/variant-distance-matrix.md @@ -0,0 +1,40 @@ +**Variant Distance Matrix** +=============== +About the Variant Distance Matrix Tool +-------------------------- +The Variant Distance Matrix tool calculates the disimilarity between variant sets in the database and stores the results back into the database. It only calculates the distances between variant sets of the same species and includes sanity checks to ensure that duplicate or improper calculations do not occur. This allows for other tools, most notably the SimilarVariantSet tool, to quickly determine relationships between variant sets. + +The Variant Distance Matrix tool makes one assumption when calculating the distance between variant sets: the genome is static within the same species. In other words, the variant distance matrix tool assumes that any two variant sets for which the distance is calculated have the same reference genome. This allows the tool to assume that the distance between two variant sets is only a function of the differences between the variants in the variant sets and not any other outside variables. Since both variant sets have the same static genome, the distance is calculated using the following formula: + +![variant distance equation](images/variant_distance_equation.png) + +Using the Variant Distance Matrix Tool +-------------------------- +This tool mainly functions as a helper tool to the VariantBatchUpload and SimilarVariantSet tools and is unable to be called from the website. Upon the upload of a VariantSet, multiple Variant Distance Matrix instances are forked off to calculate the distances between the new VariantSet and all existing variant sets in the database. The SimilarVariantSet uses this tool when any two uploaded variant sets do not already have an existing distance in the database. The SimilarVariantSet then forks off instances of the Variant Distance Matrix tool in order to calculate these distances. + +Both tools pass in two parameters to the Variant Distance Matrix tool: new_gsid and gs_ids. The new_gsid contains the new genset id for which the distances between all of the geneset ids in the gs_ids parameter must be calculated for. + +The results of the tool run are inaccessible for users through the website. + +Options +------- +There is one option that can be used in conjunction with this tool. + +### Disable Sanity Check + The Disable Sanity Check option can be set to True in order to speed up distance matrix creation. This option disables the two sanity checks that are present in the tool and does the following checks. + + * the genesets are of the same species + * the distance hasn't already been calculated + * the geneset is a variantset + If it is assured that the genesets passed in fulfill these criteria, the sanity checks can be turned off in order to speed up the creation of the distance matrix. + +### API Tool + +To run the VariantDistanceMatrix tool, use the following API url. +`https://www.geneweaver.org/api/tool/VariantDistanceMatrix/////` + +The parameters are as follows: + +* apikey - your geneweaver apikey +* gs_id - the gs_id of one of your variant sets that you wish to calculate a distance matrix for +* gs_ids - the remaining gs_ids of your variant sets that you wish to calculate a distance matrix for diff --git a/docs/analysis-tools/variant-pipeline.md b/docs/analysis-tools/variant-pipeline.md new file mode 100644 index 0000000..fe9c007 --- /dev/null +++ b/docs/analysis-tools/variant-pipeline.md @@ -0,0 +1,50 @@ +## Variant Pipeline + +5 api endpoints are created for the following 5 pipelines: eggo, eggv, eggr, eggix, and eggin. Each pipeline has a get endpoint to retrieve the information from running each entire pipeline. + +The pipeline can be found on Swagger UI through running Geneweaver on Watson and using this link: http://localhost:8889/api/ + +Each api will load a default configuration file in yaml format. The suggested configuration information is the following: + +**environment:** + + - hpc: true + - local: false + - custom: false + - cores: 4 + - processes: 4 + - jobs: 15 + - memory: '40GB' + - walltime: '05:00:00' + - interface: 'ib0' + +**directories:** + + - data: 'data/' + - temp: ~ + +**scheduler:** ~ + +**workers:** ~ + +**overwrite:** true + +**species:** ~ + +**format:** tsv + +**orthology:** + + - species1: hg38 + - species2: mm10 + +And the following is the requirement for Eggv: + +The EGG:V pipeline has some hefty storage and memory requirements. +**Storage:** +To be safe, at least **500GB** of disk space should be available if both **hg38** and **mm10** builds will be processed. +**Memory:** +The lowest amount of total available memory this pipeline has been tested with is **450GB**. +Since processing is done in-memory, all at once, systems with total memory below **400GB** might not be able to run the complete pipeline. +**CPU:** +Use as many CPU cores as you possibly can. diff --git a/docs/analysis-tools/variant-sets.md b/docs/analysis-tools/variant-sets.md new file mode 100644 index 0000000..f59fb5e --- /dev/null +++ b/docs/analysis-tools/variant-sets.md @@ -0,0 +1,12 @@ +# Variant Sets + +## Visualization + +**Data** + +The data of the raw Variant Set contains both variants and genes. The connection between each variant with its corresponding gene is saved in the data as well. +The Variant Set utilizes different colors to indicate different genes and different evidences for the variants. + +The radius bar is used to customize the size of the node. This visualization supports mouse-hovering to highlight the genes of interest. + +![](images/raw_variant.png) diff --git a/docs/assets/images/.DS_Store b/docs/assets/images/.DS_Store new file mode 100644 index 0000000..c7bc674 Binary files /dev/null and b/docs/assets/images/.DS_Store differ diff --git a/docs/assets/images/AnalyzeGeneSetsPage.png b/docs/assets/images/AnalyzeGeneSetsPage.png new file mode 100644 index 0000000..92ff184 Binary files /dev/null and b/docs/assets/images/AnalyzeGeneSetsPage.png differ diff --git a/docs/assets/images/AnalyzeGeneSetsTab.png b/docs/assets/images/AnalyzeGeneSetsTab.png new file mode 100644 index 0000000..271dba9 Binary files /dev/null and b/docs/assets/images/AnalyzeGeneSetsTab.png differ diff --git a/docs/assets/images/Batch-upload-page.png b/docs/assets/images/Batch-upload-page.png new file mode 100644 index 0000000..a3e814f Binary files /dev/null and b/docs/assets/images/Batch-upload-page.png differ diff --git a/docs/assets/images/CS_MSET_Discarded.png b/docs/assets/images/CS_MSET_Discarded.png new file mode 100644 index 0000000..3a7a2bd Binary files /dev/null and b/docs/assets/images/CS_MSET_Discarded.png differ diff --git a/docs/assets/images/CS_MSET_Formula.png b/docs/assets/images/CS_MSET_Formula.png new file mode 100644 index 0000000..89c25d0 Binary files /dev/null and b/docs/assets/images/CS_MSET_Formula.png differ diff --git a/docs/assets/images/CS_MSET_Intersect.png b/docs/assets/images/CS_MSET_Intersect.png new file mode 100644 index 0000000..804cd17 Binary files /dev/null and b/docs/assets/images/CS_MSET_Intersect.png differ diff --git a/docs/assets/images/CS_MSET_Result.png b/docs/assets/images/CS_MSET_Result.png new file mode 100644 index 0000000..1d3d3c6 Binary files /dev/null and b/docs/assets/images/CS_MSET_Result.png differ diff --git a/docs/assets/images/CS_MSET_Sample_Size.png b/docs/assets/images/CS_MSET_Sample_Size.png new file mode 100644 index 0000000..2594135 Binary files /dev/null and b/docs/assets/images/CS_MSET_Sample_Size.png differ diff --git a/docs/assets/images/CS_MSET_Sampling.png b/docs/assets/images/CS_MSET_Sampling.png new file mode 100644 index 0000000..44e402d Binary files /dev/null and b/docs/assets/images/CS_MSET_Sampling.png differ diff --git a/docs/assets/images/CS_MSET_Select.png b/docs/assets/images/CS_MSET_Select.png new file mode 100644 index 0000000..dadfe10 Binary files /dev/null and b/docs/assets/images/CS_MSET_Select.png differ diff --git a/docs/assets/images/CS_MSET_Universe.png b/docs/assets/images/CS_MSET_Universe.png new file mode 100644 index 0000000..ba91a58 Binary files /dev/null and b/docs/assets/images/CS_MSET_Universe.png differ diff --git a/docs/assets/images/Circle_Visualization.png b/docs/assets/images/Circle_Visualization.png new file mode 100644 index 0000000..8c4a7d8 Binary files /dev/null and b/docs/assets/images/Circle_Visualization.png differ diff --git a/docs/assets/images/Cluster-onClick.png b/docs/assets/images/Cluster-onClick.png new file mode 100644 index 0000000..dcd9372 Binary files /dev/null and b/docs/assets/images/Cluster-onClick.png differ diff --git a/docs/assets/images/Cluster-onHover.png b/docs/assets/images/Cluster-onHover.png new file mode 100644 index 0000000..8d9be2f Binary files /dev/null and b/docs/assets/images/Cluster-onHover.png differ diff --git a/docs/assets/images/Combine_genesets.png b/docs/assets/images/Combine_genesets.png new file mode 100644 index 0000000..4da5e1f Binary files /dev/null and b/docs/assets/images/Combine_genesets.png differ diff --git a/docs/assets/images/Curate_geneset.png b/docs/assets/images/Curate_geneset.png new file mode 100644 index 0000000..9e22c08 Binary files /dev/null and b/docs/assets/images/Curate_geneset.png differ diff --git a/docs/assets/images/DBSCAN_Algorithm_Pseudocode.png b/docs/assets/images/DBSCAN_Algorithm_Pseudocode.png new file mode 100644 index 0000000..f51a82d Binary files /dev/null and b/docs/assets/images/DBSCAN_Algorithm_Pseudocode.png differ diff --git a/docs/assets/images/DBSCAN_Gene_To_Gene.png b/docs/assets/images/DBSCAN_Gene_To_Gene.png new file mode 100644 index 0000000..538babe Binary files /dev/null and b/docs/assets/images/DBSCAN_Gene_To_Gene.png differ diff --git a/docs/assets/images/DBSCAN_Gene_To_Gene_Set.png b/docs/assets/images/DBSCAN_Gene_To_Gene_Set.png new file mode 100644 index 0000000..65ec1f3 Binary files /dev/null and b/docs/assets/images/DBSCAN_Gene_To_Gene_Set.png differ diff --git a/docs/assets/images/DataExport.png b/docs/assets/images/DataExport.png new file mode 100644 index 0000000..8e03d8c Binary files /dev/null and b/docs/assets/images/DataExport.png differ diff --git a/docs/assets/images/Distribution.png b/docs/assets/images/Distribution.png new file mode 100644 index 0000000..3244f95 Binary files /dev/null and b/docs/assets/images/Distribution.png differ diff --git a/docs/assets/images/Edit_geneset_01.png b/docs/assets/images/Edit_geneset_01.png new file mode 100644 index 0000000..95952a4 Binary files /dev/null and b/docs/assets/images/Edit_geneset_01.png differ diff --git a/docs/assets/images/Edit_geneset_02.png b/docs/assets/images/Edit_geneset_02.png new file mode 100644 index 0000000..9e765be Binary files /dev/null and b/docs/assets/images/Edit_geneset_02.png differ diff --git a/docs/assets/images/Edit_geneset_03.png b/docs/assets/images/Edit_geneset_03.png new file mode 100644 index 0000000..be6d683 Binary files /dev/null and b/docs/assets/images/Edit_geneset_03.png differ diff --git a/docs/assets/images/Edit_geneset_04.png b/docs/assets/images/Edit_geneset_04.png new file mode 100644 index 0000000..6bfaa25 Binary files /dev/null and b/docs/assets/images/Edit_geneset_04.png differ diff --git a/docs/assets/images/Edit_geneset_05.png b/docs/assets/images/Edit_geneset_05.png new file mode 100644 index 0000000..2454bfd Binary files /dev/null and b/docs/assets/images/Edit_geneset_05.png differ diff --git a/docs/assets/images/Emphasize_genes_01.png b/docs/assets/images/Emphasize_genes_01.png new file mode 100644 index 0000000..106963e Binary files /dev/null and b/docs/assets/images/Emphasize_genes_01.png differ diff --git a/docs/assets/images/Emphasize_genes_02.png b/docs/assets/images/Emphasize_genes_02.png new file mode 100644 index 0000000..89839c4 Binary files /dev/null and b/docs/assets/images/Emphasize_genes_02.png differ diff --git a/docs/assets/images/Emphasize_genes_03.png b/docs/assets/images/Emphasize_genes_03.png new file mode 100644 index 0000000..7feee57 Binary files /dev/null and b/docs/assets/images/Emphasize_genes_03.png differ diff --git a/docs/assets/images/Enter_password.png b/docs/assets/images/Enter_password.png new file mode 100644 index 0000000..1449af7 Binary files /dev/null and b/docs/assets/images/Enter_password.png differ diff --git a/docs/assets/images/ExternalResources.png b/docs/assets/images/ExternalResources.png new file mode 100644 index 0000000..afa652d Binary files /dev/null and b/docs/assets/images/ExternalResources.png differ diff --git a/docs/assets/images/Figure_2_Curation.png b/docs/assets/images/Figure_2_Curation.png new file mode 100644 index 0000000..56301da Binary files /dev/null and b/docs/assets/images/Figure_2_Curation.png differ diff --git a/docs/assets/images/Figure_4_Curation.png b/docs/assets/images/Figure_4_Curation.png new file mode 100644 index 0000000..122f304 Binary files /dev/null and b/docs/assets/images/Figure_4_Curation.png differ diff --git a/docs/assets/images/Figure_5_Curation.png b/docs/assets/images/Figure_5_Curation.png new file mode 100644 index 0000000..c051300 Binary files /dev/null and b/docs/assets/images/Figure_5_Curation.png differ diff --git a/docs/assets/images/FindVariants_graph.png b/docs/assets/images/FindVariants_graph.png new file mode 100644 index 0000000..596e0b5 Binary files /dev/null and b/docs/assets/images/FindVariants_graph.png differ diff --git a/docs/assets/images/Forced-directed-graph.png b/docs/assets/images/Forced-directed-graph.png new file mode 100644 index 0000000..7024408 Binary files /dev/null and b/docs/assets/images/Forced-directed-graph.png differ diff --git a/docs/assets/images/GeneSet_Graph_1.png b/docs/assets/images/GeneSet_Graph_1.png new file mode 100644 index 0000000..0c2f45c Binary files /dev/null and b/docs/assets/images/GeneSet_Graph_1.png differ diff --git a/docs/assets/images/GeneSet_Graph_2.png b/docs/assets/images/GeneSet_Graph_2.png new file mode 100644 index 0000000..b79ab7d Binary files /dev/null and b/docs/assets/images/GeneSet_Graph_2.png differ diff --git a/docs/assets/images/GeneSet_Graph_3.png b/docs/assets/images/GeneSet_Graph_3.png new file mode 100644 index 0000000..094f265 Binary files /dev/null and b/docs/assets/images/GeneSet_Graph_3.png differ diff --git a/docs/assets/images/GeneSet_Graph_4-5.png b/docs/assets/images/GeneSet_Graph_4-5.png new file mode 100644 index 0000000..6b5a627 Binary files /dev/null and b/docs/assets/images/GeneSet_Graph_4-5.png differ diff --git a/docs/assets/images/GeneSet_Graph_6-7.png b/docs/assets/images/GeneSet_Graph_6-7.png new file mode 100644 index 0000000..e8ce1fc Binary files /dev/null and b/docs/assets/images/GeneSet_Graph_6-7.png differ diff --git a/docs/assets/images/GeneWeaverHomePageQRcode.png b/docs/assets/images/GeneWeaverHomePageQRcode.png new file mode 100644 index 0000000..1782a57 Binary files /dev/null and b/docs/assets/images/GeneWeaverHomePageQRcode.png differ diff --git a/docs/assets/images/Geneset_details_01.png b/docs/assets/images/Geneset_details_01.png new file mode 100644 index 0000000..ea8f5a5 Binary files /dev/null and b/docs/assets/images/Geneset_details_01.png differ diff --git a/docs/assets/images/Geneset_details_02.png b/docs/assets/images/Geneset_details_02.png new file mode 100644 index 0000000..5362f69 Binary files /dev/null and b/docs/assets/images/Geneset_details_02.png differ diff --git a/docs/assets/images/Geneset_details_03.png b/docs/assets/images/Geneset_details_03.png new file mode 100644 index 0000000..cc21ec4 Binary files /dev/null and b/docs/assets/images/Geneset_details_03.png differ diff --git a/docs/assets/images/HiSImGraphMinGenes.png b/docs/assets/images/HiSImGraphMinGenes.png new file mode 100644 index 0000000..0c26827 Binary files /dev/null and b/docs/assets/images/HiSImGraphMinGenes.png differ diff --git a/docs/assets/images/HiSimGraphBootstrapFalse.png b/docs/assets/images/HiSimGraphBootstrapFalse.png new file mode 100644 index 0000000..b993b7e Binary files /dev/null and b/docs/assets/images/HiSimGraphBootstrapFalse.png differ diff --git a/docs/assets/images/HiSimGraphBootstrapTrue.png b/docs/assets/images/HiSimGraphBootstrapTrue.png new file mode 100644 index 0000000..fb4ce59 Binary files /dev/null and b/docs/assets/images/HiSimGraphBootstrapTrue.png differ diff --git a/docs/assets/images/HiSimGraphComplex.png b/docs/assets/images/HiSimGraphComplex.png new file mode 100644 index 0000000..6fa67d6 Binary files /dev/null and b/docs/assets/images/HiSimGraphComplex.png differ diff --git a/docs/assets/images/HiSimGraphGeneGenesets.png b/docs/assets/images/HiSimGraphGeneGenesets.png new file mode 100644 index 0000000..2f3c7d8 Binary files /dev/null and b/docs/assets/images/HiSimGraphGeneGenesets.png differ diff --git a/docs/assets/images/HiSimGraphHomology_Excluded.png b/docs/assets/images/HiSimGraphHomology_Excluded.png new file mode 100644 index 0000000..8be8fc2 Binary files /dev/null and b/docs/assets/images/HiSimGraphHomology_Excluded.png differ diff --git a/docs/assets/images/HiSimGraphHomology_Included.png b/docs/assets/images/HiSimGraphHomology_Included.png new file mode 100644 index 0000000..1351010 Binary files /dev/null and b/docs/assets/images/HiSimGraphHomology_Included.png differ diff --git a/docs/assets/images/HiSimGraphResultsPage.png b/docs/assets/images/HiSimGraphResultsPage.png new file mode 100644 index 0000000..d8d76c2 Binary files /dev/null and b/docs/assets/images/HiSimGraphResultsPage.png differ diff --git a/docs/assets/images/HiSimGraphSearchFunction.png b/docs/assets/images/HiSimGraphSearchFunction.png new file mode 100644 index 0000000..008cec5 Binary files /dev/null and b/docs/assets/images/HiSimGraphSearchFunction.png differ diff --git a/docs/assets/images/HiSimGraphStatsAndSliders.png b/docs/assets/images/HiSimGraphStatsAndSliders.png new file mode 100644 index 0000000..ac8f986 Binary files /dev/null and b/docs/assets/images/HiSimGraphStatsAndSliders.png differ diff --git a/docs/assets/images/HiSimGraph_AnalyzeGeneSets.png b/docs/assets/images/HiSimGraph_AnalyzeGeneSets.png new file mode 100644 index 0000000..5bfdce2 Binary files /dev/null and b/docs/assets/images/HiSimGraph_AnalyzeGeneSets.png differ diff --git a/docs/assets/images/Image001.png b/docs/assets/images/Image001.png new file mode 100644 index 0000000..d420aba Binary files /dev/null and b/docs/assets/images/Image001.png differ diff --git a/docs/assets/images/Image002.png b/docs/assets/images/Image002.png new file mode 100644 index 0000000..6c56e1e Binary files /dev/null and b/docs/assets/images/Image002.png differ diff --git a/docs/assets/images/Image003.png b/docs/assets/images/Image003.png new file mode 100644 index 0000000..eca5c57 Binary files /dev/null and b/docs/assets/images/Image003.png differ diff --git a/docs/assets/images/Image004.png b/docs/assets/images/Image004.png new file mode 100644 index 0000000..284619f Binary files /dev/null and b/docs/assets/images/Image004.png differ diff --git a/docs/assets/images/Image006 (1).png b/docs/assets/images/Image006 (1).png new file mode 100644 index 0000000..a7414be Binary files /dev/null and b/docs/assets/images/Image006 (1).png differ diff --git a/docs/assets/images/Image006.png b/docs/assets/images/Image006.png new file mode 100644 index 0000000..a7414be Binary files /dev/null and b/docs/assets/images/Image006.png differ diff --git a/docs/assets/images/Image008.png b/docs/assets/images/Image008.png new file mode 100644 index 0000000..9a328a8 Binary files /dev/null and b/docs/assets/images/Image008.png differ diff --git a/docs/assets/images/Image010.png b/docs/assets/images/Image010.png new file mode 100644 index 0000000..0b7089e Binary files /dev/null and b/docs/assets/images/Image010.png differ diff --git a/docs/assets/images/Image014.png b/docs/assets/images/Image014.png new file mode 100644 index 0000000..6596fb1 Binary files /dev/null and b/docs/assets/images/Image014.png differ diff --git a/docs/assets/images/Image017 (1).png b/docs/assets/images/Image017 (1).png new file mode 100644 index 0000000..b0de847 Binary files /dev/null and b/docs/assets/images/Image017 (1).png differ diff --git a/docs/assets/images/Image017.png b/docs/assets/images/Image017.png new file mode 100644 index 0000000..b0de847 Binary files /dev/null and b/docs/assets/images/Image017.png differ diff --git a/docs/assets/images/Image019.png b/docs/assets/images/Image019.png new file mode 100644 index 0000000..6fc2e43 Binary files /dev/null and b/docs/assets/images/Image019.png differ diff --git a/docs/assets/images/Image021.png b/docs/assets/images/Image021.png new file mode 100644 index 0000000..c0d3cc0 Binary files /dev/null and b/docs/assets/images/Image021.png differ diff --git a/docs/assets/images/Image023.png b/docs/assets/images/Image023.png new file mode 100644 index 0000000..7f748c1 Binary files /dev/null and b/docs/assets/images/Image023.png differ diff --git a/docs/assets/images/Image027.png b/docs/assets/images/Image027.png new file mode 100644 index 0000000..40c6fba Binary files /dev/null and b/docs/assets/images/Image027.png differ diff --git a/docs/assets/images/Image029.png b/docs/assets/images/Image029.png new file mode 100644 index 0000000..66d4b1e Binary files /dev/null and b/docs/assets/images/Image029.png differ diff --git a/docs/assets/images/Image031.png b/docs/assets/images/Image031.png new file mode 100644 index 0000000..effe9f7 Binary files /dev/null and b/docs/assets/images/Image031.png differ diff --git a/docs/assets/images/Image033.png b/docs/assets/images/Image033.png new file mode 100644 index 0000000..65a55b0 Binary files /dev/null and b/docs/assets/images/Image033.png differ diff --git a/docs/assets/images/Image035.png b/docs/assets/images/Image035.png new file mode 100644 index 0000000..3b84b01 Binary files /dev/null and b/docs/assets/images/Image035.png differ diff --git a/docs/assets/images/Image041.png b/docs/assets/images/Image041.png new file mode 100644 index 0000000..247b1f2 Binary files /dev/null and b/docs/assets/images/Image041.png differ diff --git a/docs/assets/images/Image047.png b/docs/assets/images/Image047.png new file mode 100644 index 0000000..0d28b02 Binary files /dev/null and b/docs/assets/images/Image047.png differ diff --git a/docs/assets/images/Image049.png b/docs/assets/images/Image049.png new file mode 100644 index 0000000..65910f7 Binary files /dev/null and b/docs/assets/images/Image049.png differ diff --git a/docs/assets/images/Image051.png b/docs/assets/images/Image051.png new file mode 100644 index 0000000..7bfe337 Binary files /dev/null and b/docs/assets/images/Image051.png differ diff --git a/docs/assets/images/Image053.png b/docs/assets/images/Image053.png new file mode 100644 index 0000000..fad3cf8 Binary files /dev/null and b/docs/assets/images/Image053.png differ diff --git a/docs/assets/images/Image055.png b/docs/assets/images/Image055.png new file mode 100644 index 0000000..d35aaa0 Binary files /dev/null and b/docs/assets/images/Image055.png differ diff --git a/docs/assets/images/Image057.png b/docs/assets/images/Image057.png new file mode 100644 index 0000000..f47a695 Binary files /dev/null and b/docs/assets/images/Image057.png differ diff --git a/docs/assets/images/Image058.png b/docs/assets/images/Image058.png new file mode 100644 index 0000000..bceebd8 Binary files /dev/null and b/docs/assets/images/Image058.png differ diff --git a/docs/assets/images/Image060.png b/docs/assets/images/Image060.png new file mode 100644 index 0000000..80c7382 Binary files /dev/null and b/docs/assets/images/Image060.png differ diff --git a/docs/assets/images/Image062.png b/docs/assets/images/Image062.png new file mode 100644 index 0000000..fda5f7d Binary files /dev/null and b/docs/assets/images/Image062.png differ diff --git a/docs/assets/images/Image064.png b/docs/assets/images/Image064.png new file mode 100644 index 0000000..0099d38 Binary files /dev/null and b/docs/assets/images/Image064.png differ diff --git a/docs/assets/images/Image066.png b/docs/assets/images/Image066.png new file mode 100644 index 0000000..0893765 Binary files /dev/null and b/docs/assets/images/Image066.png differ diff --git a/docs/assets/images/Image068.png b/docs/assets/images/Image068.png new file mode 100644 index 0000000..add9d54 Binary files /dev/null and b/docs/assets/images/Image068.png differ diff --git a/docs/assets/images/Image072.png b/docs/assets/images/Image072.png new file mode 100644 index 0000000..9603fb6 Binary files /dev/null and b/docs/assets/images/Image072.png differ diff --git a/docs/assets/images/Image074.png b/docs/assets/images/Image074.png new file mode 100644 index 0000000..055983f Binary files /dev/null and b/docs/assets/images/Image074.png differ diff --git a/docs/assets/images/Image076.png b/docs/assets/images/Image076.png new file mode 100644 index 0000000..3445892 Binary files /dev/null and b/docs/assets/images/Image076.png differ diff --git a/docs/assets/images/Image078.png b/docs/assets/images/Image078.png new file mode 100644 index 0000000..4a0d776 Binary files /dev/null and b/docs/assets/images/Image078.png differ diff --git a/docs/assets/images/Jaccard_Similarity_1.png b/docs/assets/images/Jaccard_Similarity_1.png new file mode 100644 index 0000000..3732781 Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_1.png differ diff --git a/docs/assets/images/Jaccard_Similarity_2.png b/docs/assets/images/Jaccard_Similarity_2.png new file mode 100644 index 0000000..c1dc060 Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_2.png differ diff --git a/docs/assets/images/Jaccard_Similarity_3.png b/docs/assets/images/Jaccard_Similarity_3.png new file mode 100644 index 0000000..8f960c0 Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_3.png differ diff --git a/docs/assets/images/Jaccard_Similarity_5.png b/docs/assets/images/Jaccard_Similarity_5.png new file mode 100644 index 0000000..c23cb60 Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_5.png differ diff --git a/docs/assets/images/Jaccard_Similarity_6.png b/docs/assets/images/Jaccard_Similarity_6.png new file mode 100644 index 0000000..98692b6 Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_6.png differ diff --git a/docs/assets/images/Jaccard_Similarity_Rerun.png b/docs/assets/images/Jaccard_Similarity_Rerun.png new file mode 100644 index 0000000..37c94fb Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_Rerun.png differ diff --git a/docs/assets/images/Jaccard_Similarity_highlight.png b/docs/assets/images/Jaccard_Similarity_highlight.png new file mode 100644 index 0000000..3773771 Binary files /dev/null and b/docs/assets/images/Jaccard_Similarity_highlight.png differ diff --git a/docs/assets/images/JoinPublicGroups.png b/docs/assets/images/JoinPublicGroups.png new file mode 100644 index 0000000..3aad3f8 Binary files /dev/null and b/docs/assets/images/JoinPublicGroups.png differ diff --git a/docs/assets/images/Large_Table_Visualization.png b/docs/assets/images/Large_Table_Visualization.png new file mode 100644 index 0000000..8944d42 Binary files /dev/null and b/docs/assets/images/Large_Table_Visualization.png differ diff --git a/docs/assets/images/MSET_IntersectSB.png b/docs/assets/images/MSET_IntersectSB.png new file mode 100644 index 0000000..40fe62d Binary files /dev/null and b/docs/assets/images/MSET_IntersectSB.png differ diff --git a/docs/assets/images/MSET_IntersectTGoI.png b/docs/assets/images/MSET_IntersectTGoI.png new file mode 100644 index 0000000..6dfab1e Binary files /dev/null and b/docs/assets/images/MSET_IntersectTGoI.png differ diff --git a/docs/assets/images/MSET_Sampling.png b/docs/assets/images/MSET_Sampling.png new file mode 100644 index 0000000..cdda83d Binary files /dev/null and b/docs/assets/images/MSET_Sampling.png differ diff --git a/docs/assets/images/MSSigDB_Codes.png b/docs/assets/images/MSSigDB_Codes.png new file mode 100644 index 0000000..943eb1e Binary files /dev/null and b/docs/assets/images/MSSigDB_Codes.png differ diff --git a/docs/assets/images/ManageGroups.png b/docs/assets/images/ManageGroups.png new file mode 100644 index 0000000..79f932f Binary files /dev/null and b/docs/assets/images/ManageGroups.png differ diff --git a/docs/assets/images/MyProjectsAdd.png b/docs/assets/images/MyProjectsAdd.png new file mode 100644 index 0000000..e5f4fcb Binary files /dev/null and b/docs/assets/images/MyProjectsAdd.png differ diff --git a/docs/assets/images/MyProjectsCreate.png b/docs/assets/images/MyProjectsCreate.png new file mode 100644 index 0000000..c6bc1a4 Binary files /dev/null and b/docs/assets/images/MyProjectsCreate.png differ diff --git a/docs/assets/images/MyProjectsDelete.png b/docs/assets/images/MyProjectsDelete.png new file mode 100644 index 0000000..f765aea Binary files /dev/null and b/docs/assets/images/MyProjectsDelete.png differ diff --git a/docs/assets/images/MyProjectsEditName.png b/docs/assets/images/MyProjectsEditName.png new file mode 100644 index 0000000..746e5ab Binary files /dev/null and b/docs/assets/images/MyProjectsEditName.png differ diff --git a/docs/assets/images/MyProjectsExport.png b/docs/assets/images/MyProjectsExport.png new file mode 100644 index 0000000..d1be073 Binary files /dev/null and b/docs/assets/images/MyProjectsExport.png differ diff --git a/docs/assets/images/MyProjectsGeneSetIcons.png b/docs/assets/images/MyProjectsGeneSetIcons.png new file mode 100644 index 0000000..8feecf7 Binary files /dev/null and b/docs/assets/images/MyProjectsGeneSetIcons.png differ diff --git a/docs/assets/images/MyProjectsPage.png b/docs/assets/images/MyProjectsPage.png new file mode 100644 index 0000000..f871086 Binary files /dev/null and b/docs/assets/images/MyProjectsPage.png differ diff --git a/docs/assets/images/MyProjectsPage2.png b/docs/assets/images/MyProjectsPage2.png new file mode 100644 index 0000000..28e5bba Binary files /dev/null and b/docs/assets/images/MyProjectsPage2.png differ diff --git a/docs/assets/images/MyProjectsPencilIcon.png b/docs/assets/images/MyProjectsPencilIcon.png new file mode 100644 index 0000000..b9d970b Binary files /dev/null and b/docs/assets/images/MyProjectsPencilIcon.png differ diff --git a/docs/assets/images/MyProjectsPlusMinusIcons.png b/docs/assets/images/MyProjectsPlusMinusIcons.png new file mode 100644 index 0000000..dd4e577 Binary files /dev/null and b/docs/assets/images/MyProjectsPlusMinusIcons.png differ diff --git a/docs/assets/images/MyProjectsRemoveGeneSets.png b/docs/assets/images/MyProjectsRemoveGeneSets.png new file mode 100644 index 0000000..a166513 Binary files /dev/null and b/docs/assets/images/MyProjectsRemoveGeneSets.png differ diff --git a/docs/assets/images/MyProjectsSearch.png b/docs/assets/images/MyProjectsSearch.png new file mode 100644 index 0000000..05fe09c Binary files /dev/null and b/docs/assets/images/MyProjectsSearch.png differ diff --git a/docs/assets/images/MyProjectsSelection.png b/docs/assets/images/MyProjectsSelection.png new file mode 100644 index 0000000..37855c9 Binary files /dev/null and b/docs/assets/images/MyProjectsSelection.png differ diff --git a/docs/assets/images/MyProjectsShare.png b/docs/assets/images/MyProjectsShare.png new file mode 100644 index 0000000..edcaaf0 Binary files /dev/null and b/docs/assets/images/MyProjectsShare.png differ diff --git a/docs/assets/images/MyProjectsShareProjectIcon.png b/docs/assets/images/MyProjectsShareProjectIcon.png new file mode 100644 index 0000000..f8c5142 Binary files /dev/null and b/docs/assets/images/MyProjectsShareProjectIcon.png differ diff --git a/docs/assets/images/MyProjectsSharedGroups.png b/docs/assets/images/MyProjectsSharedGroups.png new file mode 100644 index 0000000..d5bbf77 Binary files /dev/null and b/docs/assets/images/MyProjectsSharedGroups.png differ diff --git a/docs/assets/images/MyProjectsSharedIcon.png b/docs/assets/images/MyProjectsSharedIcon.png new file mode 100644 index 0000000..6523875 Binary files /dev/null and b/docs/assets/images/MyProjectsSharedIcon.png differ diff --git a/docs/assets/images/MyProjectsStarIcon.png b/docs/assets/images/MyProjectsStarIcon.png new file mode 100644 index 0000000..e74d025 Binary files /dev/null and b/docs/assets/images/MyProjectsStarIcon.png differ diff --git a/docs/assets/images/MyProjectsTrashIcon.png b/docs/assets/images/MyProjectsTrashIcon.png new file mode 100644 index 0000000..aac3425 Binary files /dev/null and b/docs/assets/images/MyProjectsTrashIcon.png differ diff --git a/docs/assets/images/NewSlide24.jpg b/docs/assets/images/NewSlide24.jpg new file mode 100644 index 0000000..c64f9d4 Binary files /dev/null and b/docs/assets/images/NewSlide24.jpg differ diff --git a/docs/assets/images/NewSlide25.jpg b/docs/assets/images/NewSlide25.jpg new file mode 100644 index 0000000..f93a57e Binary files /dev/null and b/docs/assets/images/NewSlide25.jpg differ diff --git a/docs/assets/images/NewSlide26.jpg b/docs/assets/images/NewSlide26.jpg new file mode 100644 index 0000000..6f2a448 Binary files /dev/null and b/docs/assets/images/NewSlide26.jpg differ diff --git a/docs/assets/images/NewSlide28.jpg b/docs/assets/images/NewSlide28.jpg new file mode 100644 index 0000000..7187881 Binary files /dev/null and b/docs/assets/images/NewSlide28.jpg differ diff --git a/docs/assets/images/NewSlide37.jpg b/docs/assets/images/NewSlide37.jpg new file mode 100644 index 0000000..66584af Binary files /dev/null and b/docs/assets/images/NewSlide37.jpg differ diff --git a/docs/assets/images/Partitioned-sunburst.png b/docs/assets/images/Partitioned-sunburst.png new file mode 100644 index 0000000..af9ec27 Binary files /dev/null and b/docs/assets/images/Partitioned-sunburst.png differ diff --git a/docs/assets/images/Phenome_Map_13.png b/docs/assets/images/Phenome_Map_13.png new file mode 100644 index 0000000..db1a724 Binary files /dev/null and b/docs/assets/images/Phenome_Map_13.png differ diff --git a/docs/assets/images/Phenome_Map_14.png b/docs/assets/images/Phenome_Map_14.png new file mode 100644 index 0000000..04352ce Binary files /dev/null and b/docs/assets/images/Phenome_Map_14.png differ diff --git a/docs/assets/images/Pick_Account.png b/docs/assets/images/Pick_Account.png new file mode 100644 index 0000000..dc8ff1e Binary files /dev/null and b/docs/assets/images/Pick_Account.png differ diff --git a/docs/assets/images/Prepare-data-for-upload.png b/docs/assets/images/Prepare-data-for-upload.png new file mode 100644 index 0000000..a87a5b3 Binary files /dev/null and b/docs/assets/images/Prepare-data-for-upload.png differ diff --git a/docs/assets/images/ProjectsSelectGenesets.png b/docs/assets/images/ProjectsSelectGenesets.png new file mode 100644 index 0000000..0029d63 Binary files /dev/null and b/docs/assets/images/ProjectsSelectGenesets.png differ diff --git a/docs/assets/images/Quick_Start_Guide_1.png b/docs/assets/images/Quick_Start_Guide_1.png new file mode 100644 index 0000000..efebdb7 Binary files /dev/null and b/docs/assets/images/Quick_Start_Guide_1.png differ diff --git a/docs/assets/images/Quick_Start_Guide_2.png b/docs/assets/images/Quick_Start_Guide_2.png new file mode 100644 index 0000000..a86ca95 Binary files /dev/null and b/docs/assets/images/Quick_Start_Guide_2.png differ diff --git a/docs/assets/images/Quick_Start_Guide_3.png b/docs/assets/images/Quick_Start_Guide_3.png new file mode 100644 index 0000000..9b4e754 Binary files /dev/null and b/docs/assets/images/Quick_Start_Guide_3.png differ diff --git a/docs/assets/images/Quick_Start_Guide_4.png b/docs/assets/images/Quick_Start_Guide_4.png new file mode 100644 index 0000000..092c7e5 Binary files /dev/null and b/docs/assets/images/Quick_Start_Guide_4.png differ diff --git a/docs/assets/images/Quick_Start_Guide_5.png b/docs/assets/images/Quick_Start_Guide_5.png new file mode 100644 index 0000000..14f26c9 Binary files /dev/null and b/docs/assets/images/Quick_Start_Guide_5.png differ diff --git a/docs/assets/images/Quick_Start_Guide_6.png b/docs/assets/images/Quick_Start_Guide_6.png new file mode 100644 index 0000000..c7016f1 Binary files /dev/null and b/docs/assets/images/Quick_Start_Guide_6.png differ diff --git a/docs/assets/images/ResultsManagement.png b/docs/assets/images/ResultsManagement.png new file mode 100644 index 0000000..4165590 Binary files /dev/null and b/docs/assets/images/ResultsManagement.png differ diff --git a/docs/assets/images/Run_Times_Graph.jpg b/docs/assets/images/Run_Times_Graph.jpg new file mode 100644 index 0000000..13d7a00 Binary files /dev/null and b/docs/assets/images/Run_Times_Graph.jpg differ diff --git a/docs/assets/images/Screen_Shot_2016-11-29_at_6.17.30_PM.png b/docs/assets/images/Screen_Shot_2016-11-29_at_6.17.30_PM.png new file mode 100644 index 0000000..e3e2450 Binary files /dev/null and b/docs/assets/images/Screen_Shot_2016-11-29_at_6.17.30_PM.png differ diff --git a/docs/assets/images/Screen_Shot_2016-11-30_at_11.18.18_PM.png b/docs/assets/images/Screen_Shot_2016-11-30_at_11.18.18_PM.png new file mode 100644 index 0000000..ee00316 Binary files /dev/null and b/docs/assets/images/Screen_Shot_2016-11-30_at_11.18.18_PM.png differ diff --git a/docs/assets/images/Screen_Shot_2016-11-30_at_11.21.40_PM.png b/docs/assets/images/Screen_Shot_2016-11-30_at_11.21.40_PM.png new file mode 100644 index 0000000..4cb0609 Binary files /dev/null and b/docs/assets/images/Screen_Shot_2016-11-30_at_11.21.40_PM.png differ diff --git a/docs/assets/images/Screen_Shot_2016-12-01_at_8.30.43_PM.png b/docs/assets/images/Screen_Shot_2016-12-01_at_8.30.43_PM.png new file mode 100644 index 0000000..aacbd7e Binary files /dev/null and b/docs/assets/images/Screen_Shot_2016-12-01_at_8.30.43_PM.png differ diff --git a/docs/assets/images/SearchAddProject.png b/docs/assets/images/SearchAddProject.png new file mode 100644 index 0000000..41f55e4 Binary files /dev/null and b/docs/assets/images/SearchAddProject.png differ diff --git a/docs/assets/images/SearchAddShareButtons.png b/docs/assets/images/SearchAddShareButtons.png new file mode 100644 index 0000000..79f769c Binary files /dev/null and b/docs/assets/images/SearchAddShareButtons.png differ diff --git a/docs/assets/images/SearchAddToProject.png b/docs/assets/images/SearchAddToProject.png new file mode 100644 index 0000000..2869dce Binary files /dev/null and b/docs/assets/images/SearchAddToProject.png differ diff --git a/docs/assets/images/SearchAnalyzeLink.png b/docs/assets/images/SearchAnalyzeLink.png new file mode 100644 index 0000000..1bb625c Binary files /dev/null and b/docs/assets/images/SearchAnalyzeLink.png differ diff --git a/docs/assets/images/SearchBox.png b/docs/assets/images/SearchBox.png new file mode 100644 index 0000000..efd1689 Binary files /dev/null and b/docs/assets/images/SearchBox.png differ diff --git a/docs/assets/images/SearchByTier.png b/docs/assets/images/SearchByTier.png new file mode 100644 index 0000000..e2d0272 Binary files /dev/null and b/docs/assets/images/SearchByTier.png differ diff --git a/docs/assets/images/SearchFilters.png b/docs/assets/images/SearchFilters.png new file mode 100644 index 0000000..5907ddd Binary files /dev/null and b/docs/assets/images/SearchFilters.png differ diff --git a/docs/assets/images/SearchIcon.png b/docs/assets/images/SearchIcon.png new file mode 100644 index 0000000..da10a91 Binary files /dev/null and b/docs/assets/images/SearchIcon.png differ diff --git a/docs/assets/images/SearchLimitedResults.png b/docs/assets/images/SearchLimitedResults.png new file mode 100644 index 0000000..c639c78 Binary files /dev/null and b/docs/assets/images/SearchLimitedResults.png differ diff --git a/docs/assets/images/SearchNoProjectsMessage.png b/docs/assets/images/SearchNoProjectsMessage.png new file mode 100644 index 0000000..6ad6564 Binary files /dev/null and b/docs/assets/images/SearchNoProjectsMessage.png differ diff --git a/docs/assets/images/SearchResults.png b/docs/assets/images/SearchResults.png new file mode 100644 index 0000000..462caf3 Binary files /dev/null and b/docs/assets/images/SearchResults.png differ diff --git a/docs/assets/images/SearchSizeSlider.png b/docs/assets/images/SearchSizeSlider.png new file mode 100644 index 0000000..141ff4d Binary files /dev/null and b/docs/assets/images/SearchSizeSlider.png differ diff --git a/docs/assets/images/SearchSuccessMessage.png b/docs/assets/images/SearchSuccessMessage.png new file mode 100644 index 0000000..7ac4555 Binary files /dev/null and b/docs/assets/images/SearchSuccessMessage.png differ diff --git a/docs/assets/images/SearchTableHeader.png b/docs/assets/images/SearchTableHeader.png new file mode 100644 index 0000000..64750b6 Binary files /dev/null and b/docs/assets/images/SearchTableHeader.png differ diff --git a/docs/assets/images/Similar_genesets_01.png b/docs/assets/images/Similar_genesets_01.png new file mode 100644 index 0000000..949c91e Binary files /dev/null and b/docs/assets/images/Similar_genesets_01.png differ diff --git a/docs/assets/images/Similar_genesets_02.png b/docs/assets/images/Similar_genesets_02.png new file mode 100644 index 0000000..27b1157 Binary files /dev/null and b/docs/assets/images/Similar_genesets_02.png differ diff --git a/docs/assets/images/StartingToolMsg.png b/docs/assets/images/StartingToolMsg.png new file mode 100644 index 0000000..4c20b39 Binary files /dev/null and b/docs/assets/images/StartingToolMsg.png differ diff --git a/docs/assets/images/Table_Visualization.png b/docs/assets/images/Table_Visualization.png new file mode 100644 index 0000000..8daaab6 Binary files /dev/null and b/docs/assets/images/Table_Visualization.png differ diff --git a/docs/assets/images/Upload-text-file.png b/docs/assets/images/Upload-text-file.png new file mode 100644 index 0000000..7d39a8a Binary files /dev/null and b/docs/assets/images/Upload-text-file.png differ diff --git a/docs/assets/images/UsingMSET1.png b/docs/assets/images/UsingMSET1.png new file mode 100644 index 0000000..9d85877 Binary files /dev/null and b/docs/assets/images/UsingMSET1.png differ diff --git a/docs/assets/images/UsingMSET2.png b/docs/assets/images/UsingMSET2.png new file mode 100644 index 0000000..716b3f5 Binary files /dev/null and b/docs/assets/images/UsingMSET2.png differ diff --git a/docs/assets/images/UsingMSET3.png b/docs/assets/images/UsingMSET3.png new file mode 100644 index 0000000..d694372 Binary files /dev/null and b/docs/assets/images/UsingMSET3.png differ diff --git a/docs/assets/images/UsingMSET4.png b/docs/assets/images/UsingMSET4.png new file mode 100644 index 0000000..102d5cc Binary files /dev/null and b/docs/assets/images/UsingMSET4.png differ diff --git a/docs/assets/images/View_my_genesets.png b/docs/assets/images/View_my_genesets.png new file mode 100644 index 0000000..354553f Binary files /dev/null and b/docs/assets/images/View_my_genesets.png differ diff --git a/docs/assets/images/Welcome.png b/docs/assets/images/Welcome.png new file mode 100644 index 0000000..b85dc95 Binary files /dev/null and b/docs/assets/images/Welcome.png differ diff --git a/docs/assets/images/abba.png b/docs/assets/images/abba.png new file mode 100644 index 0000000..c01d17c Binary files /dev/null and b/docs/assets/images/abba.png differ diff --git a/docs/assets/images/abba_2.png b/docs/assets/images/abba_2.png new file mode 100644 index 0000000..b939648 Binary files /dev/null and b/docs/assets/images/abba_2.png differ diff --git a/docs/assets/images/abba_3.png b/docs/assets/images/abba_3.png new file mode 100644 index 0000000..d568175 Binary files /dev/null and b/docs/assets/images/abba_3.png differ diff --git a/docs/assets/images/bool_image.png b/docs/assets/images/bool_image.png new file mode 100644 index 0000000..a7a5764 Binary files /dev/null and b/docs/assets/images/bool_image.png differ diff --git a/docs/assets/images/boolean_algebra_except.png b/docs/assets/images/boolean_algebra_except.png new file mode 100644 index 0000000..a05a9d2 Binary files /dev/null and b/docs/assets/images/boolean_algebra_except.png differ diff --git a/docs/assets/images/boolean_algebra_intersect.png b/docs/assets/images/boolean_algebra_intersect.png new file mode 100644 index 0000000..1a25349 Binary files /dev/null and b/docs/assets/images/boolean_algebra_intersect.png differ diff --git a/docs/assets/images/boolean_algebra_intersect3.png b/docs/assets/images/boolean_algebra_intersect3.png new file mode 100644 index 0000000..94da972 Binary files /dev/null and b/docs/assets/images/boolean_algebra_intersect3.png differ diff --git a/docs/assets/images/boolean_algebra_options.png b/docs/assets/images/boolean_algebra_options.png new file mode 100644 index 0000000..145a2a2 Binary files /dev/null and b/docs/assets/images/boolean_algebra_options.png differ diff --git a/docs/assets/images/boolean_algebra_select_species.png b/docs/assets/images/boolean_algebra_select_species.png new file mode 100644 index 0000000..b5abd1b Binary files /dev/null and b/docs/assets/images/boolean_algebra_select_species.png differ diff --git a/docs/assets/images/boolean_algebra_table.png b/docs/assets/images/boolean_algebra_table.png new file mode 100644 index 0000000..b36c2d9 Binary files /dev/null and b/docs/assets/images/boolean_algebra_table.png differ diff --git a/docs/assets/images/boolean_algebra_union.png b/docs/assets/images/boolean_algebra_union.png new file mode 100644 index 0000000..d6aa231 Binary files /dev/null and b/docs/assets/images/boolean_algebra_union.png differ diff --git a/docs/assets/images/error_400.png b/docs/assets/images/error_400.png new file mode 100644 index 0000000..e0dfcde Binary files /dev/null and b/docs/assets/images/error_400.png differ diff --git a/docs/assets/images/hierarchy.png b/docs/assets/images/hierarchy.png new file mode 100644 index 0000000..53dba89 Binary files /dev/null and b/docs/assets/images/hierarchy.png differ diff --git a/docs/assets/images/logout.png b/docs/assets/images/logout.png new file mode 100644 index 0000000..2ca7ef4 Binary files /dev/null and b/docs/assets/images/logout.png differ diff --git a/docs/assets/images/menu_bar.png b/docs/assets/images/menu_bar.png new file mode 100644 index 0000000..56c1339 Binary files /dev/null and b/docs/assets/images/menu_bar.png differ diff --git a/docs/assets/images/oops.png b/docs/assets/images/oops.png new file mode 100644 index 0000000..aedd2f3 Binary files /dev/null and b/docs/assets/images/oops.png differ diff --git a/docs/assets/images/raw_variant.png b/docs/assets/images/raw_variant.png new file mode 100644 index 0000000..ef10600 Binary files /dev/null and b/docs/assets/images/raw_variant.png differ diff --git a/docs/assets/images/redirecting.png b/docs/assets/images/redirecting.png new file mode 100644 index 0000000..3993853 Binary files /dev/null and b/docs/assets/images/redirecting.png differ diff --git a/docs/assets/images/search_bar.png b/docs/assets/images/search_bar.png new file mode 100644 index 0000000..2f08276 Binary files /dev/null and b/docs/assets/images/search_bar.png differ diff --git a/docs/assets/images/search_filter.png b/docs/assets/images/search_filter.png new file mode 100644 index 0000000..bfe49bb Binary files /dev/null and b/docs/assets/images/search_filter.png differ diff --git a/docs/assets/images/search_overview.png b/docs/assets/images/search_overview.png new file mode 100644 index 0000000..3c68545 Binary files /dev/null and b/docs/assets/images/search_overview.png differ diff --git a/docs/assets/images/ssologin.png b/docs/assets/images/ssologin.png new file mode 100644 index 0000000..085e1b0 Binary files /dev/null and b/docs/assets/images/ssologin.png differ diff --git a/docs/assets/images/variant_distance_equation.png b/docs/assets/images/variant_distance_equation.png new file mode 100644 index 0000000..645c60d Binary files /dev/null and b/docs/assets/images/variant_distance_equation.png differ diff --git a/docs/assets/images/wrong_password.png b/docs/assets/images/wrong_password.png new file mode 100644 index 0000000..0ae16a8 Binary files /dev/null and b/docs/assets/images/wrong_password.png differ diff --git a/docs/reference/available-tools.md b/docs/reference/available-tools.md deleted file mode 100644 index 85fdcab..0000000 --- a/docs/reference/available-tools.md +++ /dev/null @@ -1,101 +0,0 @@ - -!!! warning "Work in progress" - [//]: # (TODO) - Geneweaver is in the process of repackaging its tools. This documentation is - here for reference, based on existing and legacy tool packaging, and will be updated - as each tool is repackaged. - - Complete documentation on the legacy versions of analysis tools can be found in the - [legacy documentation](https://geneweaver.org/help/#analysis-tools). - -### HiSim Graph -The HiSim Graph, short for Hierarchical Similarity Graph, is a tool for grouping -functional genomic datasets based on the genes they contain. For example: The user may -want to determine what a set of experiments on alcohol preference have in common, and -what makes various experiments unique from one another. Alternatively, one may wish to -take a large set of studies of related phenomena and identify their shared or distinct -substrates. In this situation one may want to know whether there is a shared biological -basis for addiction and learning, and if so, what the substrate is. The user might also -want to examine studies of a large number of related disorders and determine whether a -more appropriate biologically-based classification can be constructed. - -The HiSim Graph Tool is designed to address these goals; it presents a tree of -hierarchical relationships for a set of input GeneSets. The structure is determined -solely from the gene overlaps of every combination of GeneSets. - -### GeneSet Graph -The GeneSet Graph is designed for the user in need of a partitioned display to -illustrate just how tied genes are to one another. For example: a user in need of a -GeneSet Graph would look for visual references more than chemical references or -references by utility. A GeneSet Graph can also help pick apart the most valuable or -most occurring genes depending on the user’s preference. - -### Jaccard Similarity -The Jaccard Similarity Tool displays a matrix of Venn diagrams, which can be very useful -for quickly finding overlapping GeneSets and evaluating the similarity of results across -a collection of experiments. This snapshot may enable you to determine which can be -removed or kept for more complex comparison analysis (such as the HiSim Graph). - -### GeneSet Clustering -Clustering is one of the most powerful tools in bioinformatics, where classifications -are too strict for data distinction, clustering helps give the user an evaluation that -is not so distinct. - -### MSET (Modular Single-Set Enrichment Tool) -Modular single-set enrichment tool (MSET): randomization-based test for list over- or -under-representation - -MSET was developed to compare gene lists. -From four character lists -``` -gene_list1, -gene_list2, -background1, -background2 -``` -it computes a randomization-based p-value describing the likelihood that the intersect -of `gene_list1` and `gene_list2` is **underexpressed** or **overexpressed** relative to -randomness alone. - -MSET is based on work from [Eisinger et al., 2013, “Development of a versatile enrichment analysis tool reveals associations between the maternal brain and mental health disorders, including autism.” BMC Neuroscience.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840590/) - -### ABBA Gene Search -Given a set of interesting genes, do other genes have similar relationships to known -sets of genes? For example, given a set of genes known to be related to drug abuse, -what other genes share similar expression patterns in drug abuse gene sets? - -By answering this question, it becomes possible to elucidate under-studied or obfuscated -genes that may play a role in complex phenotypes. - -We have developed a new GeneWeaver tool to address this question, which we call -Anchored Biclique of Biomolecular Associations (ABBA). -This tool takes advantage of the large number of collected data and cross-species -integration to find new genes for investigation. - -The search begins with a user-provided list of genes of interest, such as highly-studied -genes with known pathways and relationships. The database then finds any gene sets that -contain at least N of the genes in the provided list. From the resulting list of gene -sets, ABBA then isolates any genes that occur in at least M GeneSets but not in the -initial list. These resulting genes share similar gene set overlap with the original -input set, but may not have been previously considered in relation to the gene set of -interest. - -### Boolean Algebra -The Boolean Algebra Tool performs basic set operations on at least two Gene Sets. -Results are displayed as lists of genes beloging to one of the three different types of -set operations: Union, Intersect, and Symmetric Difference. Furthermore, results allow -users to quickly determine new relationships between Gene Sets and create a new Gene Set -based on set-derived findings. - -### DBSCAN Gene Clustering -DBSCAN (Density-Based Spatial Clustering of Application with Noise) is a clustering -algorithm that groups genes into clusters based on how closely related the genes are. - -#### Why Use the DBSCAN Tool? -In general, clustering is used to find patterns or outliers within data sets. In this -implementation of DBSCAN, genes in the same cluster would be considered similar, while -genes in different clusters would be less similar. An explanation of DBSCAN can be found -[here](https://en.wikipedia.org/wiki/DBSCAN). Within Geneweaver, this tool can be used -to infer relationships between genes. For example, if clusters with similar genes -continue to appear in tests across multiple data sets, one could say that these genes -are closely related. diff --git a/docs/reference/curation/curation-process.md b/docs/reference/curation/curation-process.md new file mode 100644 index 0000000..fef1d32 --- /dev/null +++ b/docs/reference/curation/curation-process.md @@ -0,0 +1,320 @@ +# **Curation Guide** + +The Curation menu in GeneWeaver provides options for managing curation tasks and +searching and assigning publications + +![](../../assets/images/Image001.png) + +**Managing Curation Tasks** + +![](../../assets/images/Image002.png) + +When selecting “Manage Curation Tasks” from the navigation menu you’ll be presented with +a page containing in the side bar, all of the curation groups you belong to separated by +groups you administer and groups of +which you are just a member. The main body of the page will contain the list of curation +tasks for the selected group in the side bar. The curation tasks are a mix of +publications and genesets, which have been assigned +to this group, with the tasks, which have not yet been assigned to a curator, appearing +at the top of the table. + +![](../../assets/images/Image004.png) + +You can change the selected group in the main part of the page just by clicking on the +group of interest in the side bar. + +Immediately above the table, there are buttons which will allow you to filter the +contents of the table to contain: **All** results, **Assigned** tasks, **Unassigned** +tasks, tasks which are **Ready** for +review and tasks which have been **Reviewed**. In this context _Assigned_ and +_Unassigned_ are referring to curator assignment. + +![](../../assets/images/Image006.png) + +The columns of the table are mostly self-explanatory, however it’s worth explaining PUB +ASSIGNMENT and \# GENESETS. + +The PUB ASSIGNMENT column will display the associated PubMed ID for a geneset task, when +it was entered via an association when a **Publication Assignment**. The link on the +PubMed ID will take you to the publication assignments page. + +The \# GENESETS column indicates for a publication, how many genesets are associated +with it as part of this specific publication assignment. If this publication is assigned +to another curation group as well, genesets as part of that publication Assignment will +not be part of this number. + +If you are an administrator of the curation group for which you are +managing tasks, there should also be an **Assign Curator** button at the top right of +the page. You are able to select one or more task rows in the table, at which point they +should be highlighted yellow. + +![](../../assets/images/Image008.png) + +One note about how row selection works: There are no _**Shift**_ or +_**Control**_ operations for selecting multiple rows. Rows are selected one at a time, +and remain selected until you click on the row again, when it becomes deselected. Also, +selections do not persist when you move to the next page of results. This latter issue +is something we intend to address in a future release. However, for the time being it’s +recommended you select the visible rows you would like to assign, assign them, and then +move onto the next page of results. + +Once you’ve chosen the tasks you want to assign (or reassign), you will select the +Assign Curator button. + +![](../../assets/images/Image010.png) + +You will then be presented with a modal dialog box, where you can select the individual +you wish to curate the tasks, and include a note regarding the curation assignment. + +Once a curator has been selected, click the **Assign For Curation** +button. If you select **Close** instead no assignment will be made. + +For your convenience, if you realized while in the Curation Task +Management page that you want to assign a publication to this group, so that you can +subsequently assign it to a curator, there is also an _**Add Publication**_ button at +the top of the page. + +![](../../assets/images/Image006.png) + +This button will take you to the **Search/Assign Publications** page with only +publication generators listed that were created for the curation group. + +![](../../assets/images/Image014.png) + +**Search/Assign Publications** + +![](../../assets/images/Image003.png) + +When selecting “Search/Assign Publications” from the page menu you’ll be presented with +a page containing an “accordion” display, with the middle section opened by default. The +assumption is that most times the user will be interested in generating a list of +publications from which to make assignments. + +![](../../assets/images/Image017.png) + +The section is broken into 3 parts: + +1. Single Publication Assignment +2. Publication Generators +3. Generated Publication Listing + +**Single Publication Assignment** + +If you select the **+** symbol next to _Single Publication Assignment_ you will be +presented with a simple search box. This would be used in the case where you have a +specific PubMed ID that you know and want to assign for curation. You simply enter the +PubMed ID and select the **Find Publication** button. + +![](../../assets/images/Image019.png) + +Assuming you’ve entered a valid PubMed ID, the citation will be returned so that you can +confirm that this is indeed your publication of interest. + +![](../../assets/images/Image021.png) + +To assign the publication to a curation group to work on, just select the Assign To +Curation Group button and you will be presented with the following modal dialog box +displaying a drop down so you can select the curation group and a text box so that you +can enter any curation notes you might have. + +![](../../assets/images/Image023.png) + +**Publication Generation** + +If you select the **+** symbol next to _Publication Generators_ you +will be presented with a table of generators that have been created for groups of which +you are a member, and an **Add Generator** button. + +![](../../assets/images/Image017.png) + +The columns of the table represent: the NAME that was assigned to the generator when it +was queried; the PUBMED SEARCH term that is used to search PubMed and bring back a list +of publications; FOR GROUP which is the curation group for which the generator was +created; the date the generator was LAST RUN; and a series of ACTIONS which can be +executed on a generator (will discuss these later). + +In the case where there are no generators already created for any of the groups to which +you belong, the first step would be to click **Add Generator**. This will bring up a +modal dialog box + +![](../../assets/images/Image027.png) + +You will be presented with three fields, which are all mandatory in +order to have the **Save** button enabled. Generator Name is a self +selected name to represent your generator. PubMed Query must be a valid PubMed search +term. You can learn more about valid PubMed terms using the following YouTube video +([](https://www.youtube.com/watch?v=dncRQ1cobdc&feature=relmfu)). +There is also a link to the PubMed search string builder +([](https://www.ncbi.nlm.nih.gov/pubmed/advanced)) +directly in the dialog box. + +![](../../assets/images/Image029.png) + +Once created the generator becomes available in the table of generators. + +**Generator Actions** + +There are three actions available to be used with generators: + +- ![](../../assets/images/Image031.png) Run +- ![](../../assets/images/Image033.png) Edit +- ![](../../assets/images/Image035.png) Delete + +We’ll discuss Run last as it’s most involved and leads to the next +section. + +![](../../assets/images/Image033.png)Edit is fairly straight forward. It presents you +with a modal dialog identical to the one you get when creating a new generator. You are +able to update any of name, search term or group association. + +![](../../assets/images/Image035.png) Delete will simply bring up a confirmation dialog +box. + +![](../../assets/images/Image041.png) + +![](../../assets/images/Image031.png) Lastly the Run option will cause the generator to +run against PubMed, automatically collapse the **Publication Generators** accordion +section and will expand the **Generated Publication Listing** section, with the results +of the generator displayed. + +**Generated Publication Listing** + +If you select the **+** symbol next to _Generated Publication Listing_ you will be +presented with a table of publications that have been pulled from PubMed and are the +result of the PubMed search term associated with a given generator. This section is +populated by selecting the Run ![](../../assets/images/Image031.png) icon in the +generator table. + +![](../../assets/images/Image047.png) + +Publications that are pulled by a publication generator are not +persisted in the GeneWeaver database. At least, not until the time they are assigned to +a curation group. Instead the publications that are not already assigned to a group are +pulled directly from PubMed at the time of generation. Some of these queries can result +in a very large number of publications (hundreds of thousands). Therefore we only +display a slice of the publications at a time. We do keep track of the total number that +match the search term, and allow you to page through the results, each time going back +out to PubMed to pull in the next set. + +Similar to the Curation Task Management page, you can select multiple rows to be +assigned to a curation group all at once. This is done by individually selecting each +publication of interest. There are no features for multi select all at once using either +the control or shift keys. The only way you can de-select a row, is by clicking the row +again. + +You can get more detail about a publication by clicking the **+** symbol at the +beginning of the row. This will display the title, authors, journal and publication +date, a link to the full text of the publication and the abstract. + +![](../../assets/images/Image049.png) + +Once you’ve selected the publication or publications that you would like to assign to a +curation group, you select the **Assign to Curation Group** button. This will bring up a +modal dialog box where you will select a curation group, and optionally type in a note +regarding the curation that is to be done. + +![](../../assets/images/Image051.png) + +Once assigned the publications that have been assigned to a curation group should now +have a View icon appearing at the end of the row, and if you cursor over the icon you +will see a tool tip telling you what group or groups are curating this publication. + +![](../../assets/images/Image053.png) + +Also, if you select the **+** symbol at the beginning of the row now, the groups will be +listed under **Assigned to Curation Groups** under the expanded details. + +Once an assignment has been done a notification will be sent to the +administrator of the curation group so they know that there is a new publication that +needs to be assigned to a curator. Notifications will be discussed in another section. +If you now return the the **Manage Curation Tasks** page for the curation group to which +the publication has been assigned, you should now see the publication listed at the top +of the tasks table. + +![](../../assets/images/Image055.png) + +**Publication Curation Assignment** + +You can get to the _**Publication Curation Assignment**_ page from the **Curation Task +Management** page in one of two ways. + +- Click on the PubMed ID in the TASK column of a publication row of the task table. +- Click on the PubMed ID in the PUB ASSIGNMENT column of a geneset row of the task + table. + +![](../../assets/images/Image057.png) + +If you select a publication that has not been assigned to a curator yet, you’ll get to a +page that looks something like this: + +![](../../assets/images/Image058.png) + +The citation information is present, and the curation group is +identified, but there is no curator assigned and no associated genesets. + +Assignment to a curator could have been done via the **Curation Task Management** page +as detailed previously, or by using the **Assign To Curator** button on this page. The +functionality of that button is essentially the same as on the other page, with an +option to select a curator, and include a curation note. + +Once the curator is assigned, the curator’s name and any notes that have been entered +will appear in the upper right hand side of the page. + +![](../../assets/images/Image060.png) + +As the assignee of a publication, you will be presented with an +additional button below **Save Notes** to be used to **Create New +Geneset**. The **Reassign** button that was visible to the administrator now becomes a * +*Mark as Complete** button. + +![](../../assets/images/Image062.png) + +Clicking on the Create New Geneset button brings up a dialog that allows you to enter a +“stub” for one or more new genesets. A stub is essentially a placeholder for a geneset +that will be more completely populated at a later time. This gives a curator the ability +to quickly create a bunch of stubs while reviewing an article without having to enter +the full information for each. + +![](../../assets/images/Image064.png) + +The curator can select the species of interest and then just enter the name, the label +to be used in figures and a description. They can add multiple for this species by +selecting **Add Row**, and when they’ve entered the information for all the geneset +stubs associated with this species, they hit **Submit**. + +When you’ve hit Submit, some automatic annotation of the geneset happens in the +background. Your geneset stub will not immediately become visible under **GeneSets +Created For This Assignment**. Instead you will see “loading…”. Once the geneset stubs +are created the page will display the new geneset stubs. + +![](../../assets/images/Image066.png) + +Once it’s loaded the geneset stub will appear under **GeneSets Created For This +Assignment**. It might take a while for the new geneset stub(s) to appear in the list of +genesets associated with the publication assignment, since GeneWeaver is calling out to +an external text annotator to annotate the geneset description and publication abstract. + +If there are other genesets visible to the user that are associated with this +publication, but were not created through this publication assignment, then they will +show up under **Other Visible GeneSets Associated With This Publication**. + +![](../../assets/images/Image068.png) + +Once the geneset stubs have been created, the curator can click on the link for any one +of the genesets, and begin curation of an actual geneset. + +When curation of all of the associated genesets for this publication are complete, the +curator should click the **Mark as Complete** button on the **Publication Curation +Assignment** page. + +#### Curation Page + +The geneset curation page is essentially the standard _**view geneset details**_ page +with some of the features turned off. On this page the curator can add or remove genes +from the geneset, set a threshold, edit meta content, or update the curation notes. Once +the curator has finished editing the geneset they can mark is **Ready for Review**, +which will send the geneset back to the group administrator for review. If the group has +multiple administrators then the geneset will be sent to the administrator that assigned +the curation task to the curator. + +![](../../assets/images/Curate_geneset.png) diff --git a/docs/reference/curation/curation-standards.md b/docs/reference/curation/curation-standards.md new file mode 100644 index 0000000..1b48ae0 --- /dev/null +++ b/docs/reference/curation/curation-standards.md @@ -0,0 +1,177 @@ +# **Curation Standards Documentation** + +Secondary functional genomics data consists of the results of analyzed +experiments in functional genomics. In contrast to primary data stores +such as Gene Expression Omnibus (GEO) in which raw experimental data are +stored, a secondary data store attempts to collect the results of +experimental design and decision-making process of the researcher so +that one may interpret and integrate the gene set centered outcomes of +the studies. Controlling the quality and validity of the large-scale +analysis of secondary data requires the enforcement of interpretable +standards for gene set construction and description. GeneWeaver’s use of +discrete analysis eliminates many barriers to the integration of +heterogeneous data sets across species and experiments. However, it is +important for users to be able to rapidly interpret the nature of gene +sets retrieved from the site, requiring a minimal standard for metadata +associated with secondary data. For this purpose, both unstructured +textual descriptions of the data and structured ontology annotations to +the terms in these descriptions are used to define gene sets. In the +interest of encouraging submission we are cautious not to be too +prescriptive or burdensome to users, but rather to provide guidelines on +standards used by internal curators to assess data quality and clarity +to enable rapid acceptance of community submissions to the data +repository. + +Curation Tiers +-------------- + +| Tier Name | Curator Description | +|:------------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Tier I** | **Public Resource Grade** Resource GeneWeaver Large data sets primarily curated by their parent resource. GeneWeaver ensures consistency of metadata (gene annotations to KEGG, MP and GO, curated functional associations in the Neuroinformatics Framework, Comparative Toxicogenomics Database) | +| **Tier II** | **Machine-Generated from public sources** GeneWeaver Gene sets resulting from genome analysis, not otherwise published in total, e.g. gene co-expression to behavior from GeneNetwork.org, QTL positional candidates from MGI. GeneWeaver curators examine data and metadata. | +| **Tier III** | **Human-Curated** GeneWeaver Curated user-deposited data and publication supplements in domains of interest. | +| **Tier IV** | **Submitted to Public- Provisional User** User-deposited data made available to the public. All Tier IV is examined for promotion to Tier III | +| **Tier V** | **Private User and Group data- Uncurated** User Data sets deposited for private or group-only analysis | + + +!!! tip + GeneSet tiers also have a non-curation meaning, which can be referenced on the [GeneSet Tiers](../geneset-tiers.md) page. + +# **General Definitions** + +**Gene Set Name**: A brief title for the gene set, approximately sentence length, that +should provide a clear and concise description of the contents of a gene set +interpretable to most users of GeneWeaver, but with sufficient detail to satisfy a +domain expert. This is the major gene set name that is displayed in all search results, +project directory and table views of analysis results. Standards for specific gene set +types are given in the following section. + +**Gene Set Figure Label**: A brief 23 character abbreviation to facilitate recognition +of the gene set in a graph or other display. + +**Gene Set Description**: A detailed description of the gene set, including rules for +its construction, experimental methods and analyses used to generate data, anatomical +terms, and traceable references to source data including accession information and date. +Abbreviations should be avoided. + +**Ontology Annotations**: Relevant terms from Disease Ontology, Mammalian Ontology and +other OBO ontologies supplied by curators or identified through the application of the +NCBO Annotator to textual descriptions including publication abstracts. + +**Publication Information**: PubMed ID, title, authors, publication information and +full-text of the abstract. + +# **Standards for Common Gene Set Types** + +### Type of Data: Differential Expression Profiling + +**Gene Set Name**: Genes \[upregulated/downregulated/differentially expressed\] in +\[tissue\] \[comparison\]. _**Example**_: Genes differentially expressed in striatum of +C57Bl/6J compared to C57Bl/6C. +Note: spell out anatomical terms as nouns, e.g. striatum, not striatal. Include complete +strain names, e.g. C57BL/6J not B6. + +**Gene Set Figure Label**: B6JvsB6CStriatum + +**Gene Set Description**: Indicate which samples were compared. What experimental +manipulations or tissue differences are being examined? Indicate statistical +methodology, significance thresholds and which changes are reported here. Indicate if +uploaded p-value, q-value, effect size or fold change and fold change reference. +_**Example**_: Striatum gene expression differences between naive C57BL/6J and C57BL/6C +substrains corresponding to a 5% FDR. A small number of genes are highly differentially +expressed between B6 substrains, C57BL/6J (high alcohol consumption preference) and +C57BL/6C (low alcohol consumption preference). Fold expression change are relative to +B6/J. + +**Gene Set Contents**: Gene identifier and statistical score for differential +expression, e.g. p-value, q-value, correlation coefficient, binary score, effect size or +fold change. + +### Type of Data: Published QTL Candidate Gene List + +**Gene Set Name**: Description (name, Published QT Chr \# MGI:\#). _**Example**_: +cocaine related behavior 10 (Cocrb10, Published QTL Chr \#) + +**Gene Set Figure Label**: (QTL-name-Organism-Chr \#). _**Example**_: +QTL-Cocrb10-Mouse-Chr 9 + +**Gene Set Description**: QTL Name Definition, candidate gene selection method (e.g. 1.5 +LOD drop; inter-marker interval). Exact description of phenotype. Strains used for +mapping should be included. _**Example**_: Rats were subjected to a +forced swim test (FST) procedure in which they are placed in water for 5 min, and their +behavior was scored every 5 sec as immobility, climbing, or swimming. Data were analyzed +for each activity with consideration given to their non-independence. p-value:0.0002, +Variance: 3.6, Peak Marker: D5Rat40 (BLAT 16538053) Spans 1-41538053. This interval was +obtained by using a fixed interval width of 25 Mbp around the peak marker. Strains were +WKY/NHsd and F344/NHsd. Also defined as Imm3. + +**Gene Set Contents**: Gene identifier and binary score. + +### Type of Data: Co-Expression to Phenotype + +**Gene Set Name**: Describe tissue and phenotype correlated. _**Example**_: Cerebellum +gene expression correlates of acetic acid writhing behavior in BXD recombinant inbred +mice. + +**Gene Set Figure Label**: Co-expression writhing + +**Gene Set Description**: Indicate what the comparison was that was made and any +statistical cut-offs that were used. _**Example**_: Cerebellum gene co-expression with +acetic acid writhing in BXD RI mice. Gene expression data was obtained from +genenetwork.org SJUT Cerebellum mRNA M430 (Mar05) RMA data set. Behavioral phenotype +data was collected by RMQ and consisted of the number of writhes in response to 0.6% +acetic acid i.p. + +**Gene Set Contents**: Gene identifier and statistical score for co-expression. e.g. +R-squared, p-value, q-value, binary threshold. + +### Type of Data: Reference Ontology + +**Gene Set Name**: Term \# and name. _**Example**_: MP:XXXXXXX Abnormal. + +**Gene Set Figure Label**: Term \#. _**Example**_: Term \# + +**Gene Set Description**: Term Definition. _**Example**_: “Increase in the dose or +concentration of a foreign compound required to induce a specific level of +response” [www.informatics.jax.org](http://www.informatics.jax.org), 2010-12-01 + +**Gene Set Contents**: All gene sets include genes, mutant alleles or gene products +annotated to an ontology term by a professional curator. Each gene directly annotated to +the term is given a score of 1, each gene connected to a term through annotations to its +higher order parents is given a score of 2. To use only direct annotations in an +analysis assign a threshold of < 2 to each Gene Set. + +### Type of Data: Co-Expression Clusters + +**Gene Set Name**: Co-Expression clusters. _**Example**_: Co-expression cluster of +nicotine Dependence genes significantly expressed in the adolescent PFC, VS and +Hippocampus. + +**Gene Set Figure Label**: Abbreviated description. _**Example**_: Adolesc Rat Nic +Dependence + +**Gene Set Description**: Indicate what samples were compared and what was clustered. +_**Example**_: Studies analyzing brain samples from female rats that had been injected +with nicotine at four different ages show that nicotine exerts the greatest influence +during adolescence. Using DNA microarrays, gene expression correlates were obtained from +the prefrontal cortex (PFC), ventral striatum (VS), and hippocampus. Principal cluster +analysis was then used to identify 76 genes that changed significantly in at least one +of these three brain regions during the experiment. + +**Gene Set Contents**: Gene identifier and statistical score for cluster analysis or +binary threshold. + +#### Type of Data: Genome Wide Association Study + +**Gene Set Name**: GWAS of ... _**Example**_: GWAS of Alcohol and Nicotine Dependence in +Australian DNA-Pools. + +**Gene Set Figure Label**: Abbreviated description. _**Example**_: GWAS Alcohol Nicotine + +**Gene Set Description**: List of positional candidate genes after correcting for +multiple testing and controlling the false discovery rate from genome wide association +study. Represents genes associated with a linked cytological region or genes ‘near’ an +associated SNP. _**Example**_: Genome-wide association study identifies a locus at +7p15.2 associated with endometriosis. + +**Gene Set Contents**: Gene identifier and binary threshold. diff --git a/docs/reference/curation/index.md b/docs/reference/curation/index.md new file mode 100644 index 0000000..61ec7be --- /dev/null +++ b/docs/reference/curation/index.md @@ -0,0 +1,15 @@ +# **Curation** + +Controlling the quality and validity of the large-scale analysis of secondary data +requires the enforcement of interpretable standards for gene set construction and +description. GeneWeaver’s use of discrete analysis eliminates many barriers to the +integration of heterogeneous data sets across species and experiments. However, it is +important for users to be able to rapidly interpret the nature of gene sets retrieved +from the site, requiring a minimal standard for metadata associated with secondary data. +For this purpose, both unstructured textual descriptions of the data and structured +ontology annotations to the terms in these descriptions are used to define gene sets. + +Our **[Curation Standards](curation-standards.md)** provide detailed +guidance to GeneWeaver curation policies and sample curation types. We have also +included a brief explanation of the **[Curation Process](curation-process.md)**, which +includes a guide to our *new* curation interface. diff --git a/docs/reference/geneset-utilities.md b/docs/reference/geneset-utilities.md new file mode 100644 index 0000000..d712806 --- /dev/null +++ b/docs/reference/geneset-utilities.md @@ -0,0 +1,90 @@ +**GeneSet Utilities** +====================== +**[GeneSet Details Pages](#geneset-details-pages)** allow users to view vital +information about gene sets of interest, including associated genes, homologs and +references to external links. **[Gene Intersection Lists](#gene-intersection-lists)** +are useful for determining which information is shared between gene sets of interest. In +addition, GeneWeaver tools allow users to **[Combine](#combine)** gene sets of interest +or perform more complex set operations based on **[Boolean Algebra](#boolean-algebra)**. +Gene sets may also be annotated with information about * +*[Emphasis Genes](#emphasis-genes)**, allowing users to augment GeneWeaver tools with +gene-specific information. + +## Emphasis Genes + +The Emphasis Genes utility enables users to select genes or an entire set of genes that +may be highlighted in various analysis tools. + +To set emphasis genes choose "Emphasize Genes" from the Analyze GeneSets drop-down on +the navigation bar or from the footer. + +![](../assets/images/Emphasize_genes_01.png) + +The current emphaisis genes are listed on the left side of the page. + +To modify your emphasis genes, you can remove genes one at a time using the "x" icon +next to each gene. To clear the entire list, click the "Clear all genes" button at the +top of the page. + +![](../assets/images/Emphasize_genes_02.png) + +To add a gene, type the gene name or part of it in the box on the right side of the +page. A list will appear based on the partial name. Select one and click the "Go" +button. + +![](../assets/images/Emphasize_genes_03.png) + +The gene or genes if the selection included several, will be listed on the page. Use +the "Add all genes" or "Add" link to select the desired gene(s). + +## Homology Mapping + +GeneWeaver uses the concept of Homology Mapping to expand search and analysis +capabilities beyond a single species. Currently, we rely on data provided +by [Homologene](https://www.ncbi.nlm.nih.gov/homologene) to assert homology between +clustered sets of reference gene ids. That is, GeneWeaver creates a set of unique id +clusters (representing Entrez, Ensembl, Gene Cards, etc.) representing specific genes, +these clusters are connected across species using mappings established by Homologene. + +## Gene Intersection Lists + +Gene Intersection Lists are useful for determining which information is shared between +gene sets of interest. + +Gene intersection lists can be generated by clicking on the output of various tools +including the Hypergeometric tests, Jaccard similarity matrix Venn diagrams and HiSim +Graph nodes. A table of genes by GeneSets is displayed. Next to each gene symbol are +links to gene specific queries of external resources. Each gene has links to associated +databases, such as NCBI, Ensembl, STRING, MGI, GeneNetwork, etc. For users with the +FireGoose GAGGLE extension installed, you will also find the genes on the page available +for broadcast on the page. Filled circles indicate the presence of a gene in a GeneSet. +Green (light) circles indicate that the exact gene is present in multiple gene sets. +Dark (maroon) circles indicate a homologous gene is present in multiple gene sets. The +table can be exported using the export .csv feature at the bottom of the window. + +## Combine + +GeneWeaver tools allow users to combine gene sets of interest. GeneWeaver tools operate +on a weighted bi-partite adjacency matrix, a table of Association Scores in a Gene (row) +x GeneSet (col) tab delimited text format. For many GeneSets, the scores are binary. + +To create sample GeneWeaver data for development or off-line analysis: + +1. Perform a database query using the search field. +2. Add the GeneSets to a project. +3. Go to the "Analyze GeneSets" page. +4. Select the project or specific GeneSets from projects. +5. Select the "Combine GeneSets" tool, pick [homology](#homology-mapping) included or + excluded and click run. +6. Save the file to your computer. + +![](../assets/images/Combine_genesets.png) + +## External Data Resources + +GeneWeaver contains publically available sets of genes annotated to structured +vocabularies and ontologies that are assigned Tier I, or public resource data. Other +sets of genes, such as MeSH term-to-gene annotations, are derived from the processing of +public sources and attributed to Tier II. In the case of MeSH, we take advantage of +NCBI’s gene-to-Pubmed and Pubmed-to-mesh files to produce sets of genes annotated +through their transitive associations. diff --git a/docs/reference/restful-api.md b/docs/reference/restful-api.md index ccd9260..ff24fcd 100644 --- a/docs/reference/restful-api.md +++ b/docs/reference/restful-api.md @@ -2,7 +2,9 @@ ## Geneweaver ReST API The Geneweaver ReST API powers the Geneweaver web application. The API is available at -[geneweaver-prod.jax.org/api/docs](https://geneweaver-prod.jax.org/api/docs). +[geneweaver.jax.org/api/docs :simple-swagger:](https://geneweaver.jax.org/api/docs). + +You can find information on our latest release on [GitHub :simple-github:](https://github.com/TheJacksonLaboratory/geneweaver-api/releases). ## Geneweaver A.O.N. ReST API @@ -10,7 +12,7 @@ The A.O.N geneweaver ortholog and homolog ReST API is a powerful tool for mappin IDs between species. The API is available at -[geneweaver-prod.jax.org/aon/api/docs](https://geneweaver-prod.jax.org/aon/api/docs). +[geneweaver.jax.org/aon/api/docs :simple-swagger:](https://geneweaver.jax.org/aon/api/docs). Read the paper [here](https://pubmed.ncbi.nlm.nih.gov/37891644/). diff --git a/mkdocs.yml b/mkdocs.yml index 009ffd1..5a33e9c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -78,6 +78,7 @@ markdown_extensions: class: mermaid format: !!python/name:pymdownx.superfences.fence_code_format plugins: + - search - mkdocs-jupyter: include_source: True nav: @@ -115,16 +116,30 @@ nav: # - Creating Documentation Sites: tutorial/creating-a-new-documentation-site.md - Reference: - reference/index.md + - 'Analysis Tools': + - analysis-tools/index.md + - 'HiSim Graph': analysis-tools/hisim-graph.md + - 'GeneSet Graph': analysis-tools/geneset-graph.md + - 'Jaccard Similarity': analysis-tools/jaccard-similarity.md + - 'GeneSet Clustering': analysis-tools/clustering.md + - DBSCAN: analysis-tools/dbscan.md + - MSET: analysis-tools/mset.md + - 'ABBA Gene Search': analysis-tools/abba.md + - 'Boolean Algebra': analysis-tools/boolean-algebra.md + - 'Find Variants': analysis-tools/find-variants.md - gweave (Command Line): - reference/command-line/index.md - Logging In: reference/command-line/logging-in.md - API Commands: reference/command-line/api-commands.md + - GeneSet Tiers: reference/geneset-tiers.md + - GeneSet Utilities: reference/geneset-utilities.md + - Curation: + - reference/curation/index.md + - Standards: reference/curation/curation-standards.md + - Process: reference/curation/curation-process.md - ReST API: reference/restful-api.md - Available Packages: reference/available-packages.md - - Available Tools: reference/available-tools.md -# - Scientific Workflows: reference/scientific-workflows.md - Data Model: reference/data-model.md - - GeneSet Tiers: reference/geneset-tiers.md - Contributing Guide: reference/contributing-guide.md - Development Guide: reference/development-guide.md - External Data Sources: reference/external-data-sources.md