Observations and weird results

Do you have odd or unusual results? or are you just curious to see some very specific scenarios? You are in the right place! Here I list some of my observations during my various analysis projects.

Not a threshold by emptyDrops

wiki_emptyDrops

EmptyDops doesn't set a threshold to filter empty droplets, but it will estimate background noise and eliminate all droplets that have this same expression profile.
This is why we can see droplets considered as cells in the middle of empty droplets (see arrows): the droplets are interspersed.

Note: the droplets are so superimposed in this type of representation that we can't see the phenomenon every time. On the kneeplot, we can see more of a color gradient, although it is indeed several dots joined together.

Large and small cells into the same sample

wiki_large_and_small_cells

The kneeplot takes the form of a triple knee: at the top the large cells, followed by a blur zone, then the small cells, then a second blur zone, then the empty droplets.

wiki_cellranger_kneeplot

Here, the example of a kneeplot with cellranger is prettier (you can see the knees better), but it identifies small cells as background noise and doesn't keep them. Note: is not the same sample as before.

emptyDrops does't work well

wiki_without_retain

Sometimes emptyDrops can't isolate droplets with cells from empty droplets. If it filters too low, it doesn't matter because it is possible to filter on the number of UMIs in the next filtering step.
On the other hand, emptyDrops must not eliminate cells (filter too high). To avoid this, the emptydrops.retain parameter can be used to keep all droplets above this threshold:

wiki_with_retain

Here the limit of full droplets is lower with the `emptydrops.retain` parameter set to 1000.

There are cells left with more than 15% mitochondrial RNA while I filtered them out at 15% (also works for ribosomal RNA, mechanical stress RNA, number of transcripts and number of genes)

wiki_threshold_percent_mt_15

This is totally normal, because the genes are filtered out after the cells are filtered.
So if the deleted genes are not mitochondrial genes, it can vary the percentage of mitochondrial genes in the remaining cells and make it exceed the threshold (since the percentage is a proportion).
But this variation is very minimal (even imperceptible) when there are enough reads / genes because few genes will be deleted.
Here, the are very few cells.

Impact of empty droplets on umap

wiki_empty_droplets

Here the empty droplets (with ambient RNA) correspond to cluster 0. Droplets contain ambient RNA shared with all other droplets, so this cluster is a bit similar to all other clusters, which is why it is centrally located and is very close to other clusters. NB: the reverse is not true! A central cluster doesn't mean that these are empty droplets. It can be stem cells, which differentiate into all other clusters.

Choose the right number of dimensions

wiki_info_noise

Here there seems to be a trajectory with 29 dimensions, but it doesn't exist in 27 and 31 dimensions. Case 1): Either before 29 dimension, we didn't have enough information to reveal it, at 29, we have enough info, and from 31, we added too much noise and we lost it. Case 2): Either dimension 29 added noise which artificially created a trajectory, and the information added by dimension 31 surpasses the added noise and provides a consistent umap again.

To slice:

either we expect a trajectory: we offer a version of the result with a trajectory to the biologist and he will check whether it is an artefact or not.
either we don't expect a trajectory but it is present in other dimensions: we propose a result version with trajectory and a version without trajectory, and the biologist will check whether it is an artefact or a new discovery.
either we don't expect a trajectory and it is not present anywhere else: it is undoubtedly an artefact and I don't take this trajectory into account.

Be careful with the colors, they are sometimes misleading

wiki_resolution

At a resolution of 0.2, we can clearly see that the orange cells above cluster 6 do indeed belong to cluster 1. At resolution 0.3, one might think that these same cells belong to cluster 10, but given the previous observation at resolution 0.2, it is more likely that these cells belong to cluster 9, which in a color similar to cluster 10. So you have to be careful, because this umap is less pretty than it initially suggests.

Impact of bias correction on umap

wiki_bias_correction_ump

The yellow, green and blue clusters, before correction, became a single red cluster after correction. So the over-clustering was due to the expression of ribosomal RNAs.