Skip to content

Parallelizing Leiden Runs

Pre-release
Pre-release
Compare
Choose a tag to compare
@AvantiShri AvantiShri released this 19 Feb 09:23
· 112 commits to master since this release
2ecd704

Corresponds to PR #85.

Leiden is run with multiple different random seeds (and the best partition is used) for robustness. Prior to this PR, those runs were not parallelized because trying to parallelize leidenalg.find_partition naively via joblib results in a TypeError: cannot pickle ‘PyCapsule’ object error. In this PR, parallelism is achieved by making calls to a dedicated script that runs leiden community detection (one that is called using subprocess.Popen).

Results on bpnet nanog task are here (gives the same results as before, but spends noticeably less time on the Leiden clustering steps): http://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/b3b4d7b240b8e398597100581ae791eec0a13b61/bpnet/trial1/TryBpNet_v0.5.13.0.ipynb
(Contrast with https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/2ba855b85eddc4c4d7b5e3296c6e12cce04a705d/bpnet/trial1/TryBpNet_v0.5.11.0_reducemem.ipynb)