Replies: 4 comments 11 replies
-
Hello @Jnelen, thank you for your report.
Looking at the molecules, I think the problem is the very high number of graph isomorphisms rather than the size. I have encountered similar issues with a few similar molecules to the ones you reported. Out of curiosity, which backend are you using? I noticed that
I think this might be useful, especially if
|
Beta Was this translation helpful? Give feedback.
-
A short summary of solutions for future reference: If you want to introduce some time-out functionality in your own code, there are 2 main options using native python modules: multiprocessing and signal.
from openbabel import pybel
from spyrmsd import io
from spyrmsd.molecule import Molecule
from spyrmsd.rmsd import rmsdwrapper
import signal
from contextlib import contextmanager
path1 = "DB01666_1.mol"
path2 = "DB01666_2.mol"
## Define TimeoutError
class TimeoutError(Exception):
pass
@contextmanager
def timeout(time):
def raise_timeout(signum, frame):
raise TimeoutError("Ran out of time!")
# Register a function to raise a TimeoutError on the signal.
signal.signal(signal.SIGALRM, raise_timeout)
# Schedule the signal to be sent after ``time``.
signal.alarm(time)
try:
yield
finally:
# Unregister the signal so it won't be triggered
# if the timeout is not reached.
signal.signal(signal.SIGALRM, signal.SIG_IGN)
mol1 = io.loadmol(path1)
mol2 = io.loadmol(path2)
with timeout(5):
rmsd = round(rmsdwrapper(mol1, mol2)[0],4)
print(rmsd) |
Beta Was this translation helpful? Give feedback.
-
I'm re-opening the discussion because GitHub only shows open discussions by default, and I don't want this to be hidden. |
Beta Was this translation helpful? Give feedback.
-
To follow-up on this, I have done quite a bit of experimenting. I ran into a lot of problems where I had difficulties finding a way to actually terminate the process (as you also indicated in your example earlier). However, I finally found something that seems to work decently well. If you think this is reasonable, I will polish it up and make a pull request. Basically I would add 2 functions to rmsd.py: one that just uses a timeout function on rmsdwrapper, and another one that is a multicore version of that. Ofcourse if you think it's more appropriate we could make the "regular" timeout function a private function as well. Anyway, here is the basic implementation: def _compute_rmsd(mol, mol2, queue):
rmsd = rmsdwrapper(mol1, mol2)[0]
## Use a Queue to store the result
queue.put(rmsd)
def rmsd_with_timeout(sample, timeout=None):
## Inspired by https://superfastpython.com/task-with-timeout-child-process/
mol1 = sample[0]
mol2 = sample[1]
queue = Queue()
process = Process(target=_compute_rmsd, args=(mol1, mol2, queue))
process.start()
process.join(timeout=timeout)
if process.is_alive():
## terminate the process
process.terminate()
return None
else:
return queue.get()
def rmsd_timeout_multicore(samples, timeout=5, num_workers=1):
with ProcessPoolExecutor(max_workers=num_workers) as executor:
rsmd_partial = partial(rmsd_with_timeout, timeout=timeout)
results = executor.map(rsmd_partial,samples)
return list(results) I feel like this isn't too much code, and it only uses built-in python functions. From my small testing, the performance is still quite good (+ you can also use multiple cores now aswell). In the final implementation I would also include the other variables that rmsdwrapper can use, but this was to show the basic structure. P.S.: naming suggestions are also welcome, I just put some indicative placeholders for now |
Beta Was this translation helpful? Give feedback.
-
Hi there
First of all I'd like to say that I think this is a really nice and useful package, so thanks for developing it!
However, I encountered some performance bottleneck when calculating the symmetry-corrected RMSD for larger molecules. I was performing docking calculations using Drugbank molecules, and these molecule in particular gave problems:
https://go.drugbank.com/drugs/DB14981
https://go.drugbank.com/drugs/DB01666
https://go.drugbank.com/drugs/DB05808
I am assuming this is due to large molecule size, and a potential combinatorial explosion when trying to figure out the optimal RMSD when considering the symmetry options.
Maybe it could be interesting to add some kind of time-out functionality for these cases? For standard settings, there probably should be no timeout to keep default behavior, but it could be a useful feature. For example if you set the timeout as 10, it will raise an error or return NaN if it doesn't return anything after running for 10 seconds. I could try to take a stab at implementing something like this and make a pull request, unless you don't think this is a valuable feature?
Beta Was this translation helpful? Give feedback.
All reactions