Releases · kundajelab/tfmodisco

19 Apr 01:01

v0.5.5.6

682faf6

Ability to specify a plot save directory Pre-release

Pre-release

PR here: #56, example usage here: https://github.com/kundajelab/tfmodisco/blob/682faf6ef2dc40bbf4f3b0fdd57a23000e8737a1/test/test_tfmodisco_workflow.py#L117. Helps to avoid the issue of seqlet score distribution plots getting overridden when multiple tf-modisco jobs are launched from the same directory.

Assets 2

21 Feb 23:31

AvantiShri

v0.5.5.5

f0a910c

Scikit version compatibility fix + relaxing of numerical precision in assert statements Pre-release

Pre-release

Changes:

Compatibility with scikit-learn >= 0.22 from #55 (retaining compatibility with versions < 0.22 as well).
Relaxing of assert statement numerical precision thresholds requested by @mmtrebuchet (#54 and #53).

Assets 2

12 Dec 20:35

AvantiShri

v0.5.5.4

adee311

Bugfix for reducing threshold for numerical precision for symmetry check Pre-release

Pre-release

The threshold I had to check for symmetry of the coarse-grained affinity matrix within numerical precision was too stringent (presumably because the dot product involves summation so numerical error gets added); relaxed the threshold in commit adee311. @atseng95 this should fix the error you messaged Abhi about (I made the numerical threshold much more lax than is probably required - 1e-5 might have been enough - but this is just so that no one gets stuck on that error in the future in some weird edge case).

Assets 2

06 Dec 03:59

AvantiShri

v0.5.5.3

38c0bf4

Functionality for just extracting seqlets Pre-release

Pre-release

Corresponds to pull request #51 - for situations where the user just wants to extract the seqlets. Demo notebook at https://github.com/kundajelab/tfmodisco/blob/master/examples/H1ESC_Nanog_gkmsvm/JustExtractSeqletsNanog.ipynb

Assets 2

05 Dec 07:47

AvantiShri

v0.5.5.2

19461fa

Backward compatibility for numpy, minor adjustment to gkmer embedding calc Pre-release

Pre-release

The bugfix in #47 broke backward compatibility with some earlier versions of numpy. This tagged release incorporates a fix to restore backward compatibility (commit 6be7ea5) and also makes a minor adjustment to the gapped kmer embedding calculation such that forward and reverse-complement versions of a seqlet now give exactly symmetrical embeddings within numerical precision (commit 19461fa).

To elaborate on the reason the forward and reverse versions of a seqlet would not give perfectly symmetrical embeddings prior to this fix: consider the case of gapped kmers with a word length of 3 and one gap. Previously, I was treating *NN and NN* (e.g. *AA and AA*) as though they were redundant with each other, so I only used one of them when computing the embedding. However, *AA vs. AA* can produce different results due to the difference in padding; concretely, a seqlet that had a sequence AAGGG contains the AA* gapped kmer but does NOT contain the *AA gapped kmer. Thus, when I was only including the AA* and TT* gapped kmers in my embedding and was NOT including the *AA and *TT gapped kmers, then a seqlet that had the sequence AAGGG would be recorded as containing the AA* gapped kmer but its reverse complement CCCTT would NOT be recorded as having any TT-containing gapped kmer; thus, symmetry was broken. With this fix, I now include BOTH AA* and *AA as well as BOTH TT* and *TT as features in the gapped kmer embedding; thus, a AAGGG seqlet is recorded as having a match to AA* while the reverse complement CCCTT is recorded as having a match to *TT, and symmetry is preserved.

Assets 2

23 Nov 20:41

AvantiShri

v0.5.5.0

ca898a4

Further reduced memory usage and Nan bugfix Pre-release

Pre-release

Relative to v0.5.4.0, this release incorporates the PRs #47 and #50. The first feature addresses the occurrence of Nan values in modisco.affinitymat.NumpyCosineSimilarity, and the second reduces the memory footprint of graph2binary (thanks @hy395!). (Memory usage must be released even further in subsequent releases - see #49 for discussion).

Assets 2

18 Nov 09:36

AvantiShri

v0.5.4.0

ce3d36a

Updated hit scoring notebook Pre-release

Pre-release

Corresponds to pull request #46

Updated hit scoring strategy in the demo notebook to showcase the combination of the "masked hypothetical CWM cosine similarity" and the "sum of scores" metrics.
Added associated functions for computing those scores to modisco.util.
Put in some functionality for trimming motifs (the "AggregatedSeqlet" class in the codebase) according to the information content, or according to the the sum of the absolute value of some score track (e.g. trimming motifs based on the hypothetical contribution scores).
Did some minor refactoring of the code for computing information-content scaled versions of the position probability matrices.

Assets 2

16 Nov 02:19

AvantiShri

v0.5.3.1

3935870

Version prior to changing hit scoring strategy in demo nbs Pre-release

Pre-release

The main reason for creating this version tag is that I'm about the change the hit scoring strategy in the demo notebook so I can send the newer version of the hit scoring to David & Han. The change between version 0.5.3.0 and 0.5.3.1 is that I added an option to skip the fine-grained clustering step (I don't recommend people actually use this option; I had just added in to see how things behaved without the fine-grained step). I had also added in a version of the demo notebook that ran on Google colab, which I will also update when I put in the newer hit scoring.

Assets 2

24 Aug 00:00

AvantiShri

v0.5.3.0

d99acc8

Reduced memory usage Pre-release

Pre-release

Corresponds to PR #45; some modifications for cutting down on the memory footprint.

Assets 2

07 Aug 20:17

AvantiShri

v0.5.2.0

4e347a5

Ability to have arbitrary auxiliary tracks for visualization purposes Pre-release

Pre-release

The auxiliary tracks are not used during the clustering but can be useful for visualization purposes (e.g. if you want to visualize the value of methylation/conservation/dnase footprints at a modisco motif). In the demo notebook at https://github.com/kundajelab/tfmodisco/blob/886f4815c89756a5d010a191c944061d8760c564/test/nb_test/talgata/TF%20MoDISco%20TAL%20GATA%20with%20Activations.ipynb, I use it to visualize the activations of the conv layer for each motif. The extra data tracks are supplied in the call to TfModiscoWorkflow via the other_tracks argument. other_tracks accepts a list of instances of modisco.core.DataTrack.

If the data are such that there is no concept of reverse complements (e.g. RNA-based data), then when instantiating the DataTrack objects, leave the value of rev_tracks to None (and also make sure revcomp=False when calling TfModiscoWorkflow). Otherwise, rev_tracks should be the value that fwd_tracks would have if the reverse-complement of the input sequence was provided (e.g. for conv layer activations, you can reverse-complement the original input sequence and recompute the conv layer activations). (At the time of writing, I have not personally tested out how TFMoDISco behaves for RNA-type data extensively, though others have)

If the data is such that there is no positional axis (e.g. if you want to visualize the activations of the fully-connected layer for each motif), set has_pos_axis to False when instantiating the DataTrack object. Note that I have not tested the functionality with has_pos_axis=False at all.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: kundajelab/tfmodisco

Ability to specify a plot save directory

Scikit version compatibility fix + relaxing of numerical precision in assert statements

Bugfix for reducing threshold for numerical precision for symmetry check

Functionality for just extracting seqlets

Backward compatibility for numpy, minor adjustment to gkmer embedding calc

Further reduced memory usage and Nan bugfix

Updated hit scoring notebook

Version prior to changing hit scoring strategy in demo nbs

Reduced memory usage

Ability to have arbitrary auxiliary tracks for visualization purposes