I want to creat UpSet-style plots using Plotly. I would use one of the existing packages for Matplotlib, but every plot generated by MhcVizPipe is done in Plotly except for the UpSet plot. For consistency's sake, I would like them to all be the same.
Given two or more sets or lists as input, generate an UpSet plot. There are a few components to this:
- Find the sets unique to each possible intersect.
- Plot number of elements in each set in a bar chart.
- Create the little "circle and bar" things under the bar chart to indicate which intersection a given bar belongs to.
- Ideally, allow for secondary plots up above the bar plots. E.g. boxplot of some other variable.
- Need to be able to limit how many intersects get displayed because the number increases exponentially. This could be something like we show the top two or three intersections per sample, or only show intersections that are significantly large (by some measure...).
I think I can do this in two subplots (three (or more) if you count the secondary plot(s)). The subplots share an x-axis, which represents all the different intersections. The top subplot is simply a bar chart. The bottom subplot I will probably need to manually construct by drawing shapes on it, e.g. circles and lines. The bottom plot also needs appropriate y-axis tick labels, which are the names of all the samples.
NOTE: The order of things should have two possibilities: order the intersects by largest to smallest, or order them according to the original order of the input.
This doesn't seem to difficult... but inevitably it will be more so than I expect.
- I think things are in pretty good shape now. I should write some more tests, but so far everything is working as expected and is also running fine when I use it in MhcVizPipe.
- I finished all the points above. The ordering issue was because I used the same function to get row and column locations.
- I will release it on PyPI and see if I can integrate it into MhcVizPipe.
- I have finished 1-3 up there.
- I think the sample names are getting mixed up. I am doing it all with standalone functions, and I think the ordering is changed somewhere, but not preserved. I will either write a class or a function to encapsulate the whole thing.