Release Parallelizing Leiden Runs · kundajelab/tfmodisco

Corresponds to PR #85.

Leiden is run with multiple different random seeds (and the best partition is used) for robustness. Prior to this PR, those runs were not parallelized because trying to parallelize leidenalg.find_partition naively via joblib results in a TypeError: cannot pickle ‘PyCapsule’ object error. In this PR, parallelism is achieved by making calls to a dedicated script that runs leiden community detection (one that is called using subprocess.Popen).

Results on bpnet nanog task are here (gives the same results as before, but spends noticeably less time on the Leiden clustering steps): http://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/b3b4d7b240b8e398597100581ae791eec0a13b61/bpnet/trial1/TryBpNet_v0.5.13.0.ipynb
(Contrast with https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/2ba855b85eddc4c4d7b5e3296c6e12cce04a705d/bpnet/trial1/TryBpNet_v0.5.11.0_reducemem.ipynb)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelizing Leiden Runs