Parallelizing Leiden Runs
Pre-releaseCorresponds to PR #85.
Leiden is run with multiple different random seeds (and the best partition is used) for robustness. Prior to this PR, those runs were not parallelized because trying to parallelize leidenalg.find_partition
naively via joblib results in a TypeError: cannot pickle ‘PyCapsule’ object
error. In this PR, parallelism is achieved by making calls to a dedicated script that runs leiden community detection (one that is called using subprocess.Popen
).
Results on bpnet nanog task are here (gives the same results as before, but spends noticeably less time on the Leiden clustering steps): http://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/b3b4d7b240b8e398597100581ae791eec0a13b61/bpnet/trial1/TryBpNet_v0.5.13.0.ipynb
(Contrast with https://nbviewer.jupyter.org/github/kundajelab/tfmodisco_bio_experiments/blob/2ba855b85eddc4c4d7b5e3296c6e12cce04a705d/bpnet/trial1/TryBpNet_v0.5.11.0_reducemem.ipynb)