Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File open error on list creation with hundreds of 'Trajectory' objects (needed for VirtualTrajectory init) #107

Open
lgsmith opened this issue Jun 11, 2023 · 7 comments

Comments

@lgsmith
Copy link
Contributor

lgsmith commented Jun 11, 2023

Describe the bug

When I splat a list of PDBs into pyloos.VirtualTrajectory's init method, I get a loos.loos.FileOpenError: Error while opening some_random_pdb.pdb file open error on a particular structure, but I can read this file, use createSystem on the file, and can inspect the file in pymol.

I've found that if I shuffle the splatted list I get this error on a different file which is particular to that shuffle. I've also found that if I slice the list such that I am only reading the first 509 elements, regardless of how the list is shuffled, I am able to construct the vtraj successfully.

It seems like there's some sort of cap on how many trajectories can be assembled this way, and the error emerges as the list of trajectory objects is built, before that gets fed into VirtualTrajectory. It is nowhere close to the memory limits of the machines this is being tested upon.

To Reproduce

Steps to reproduce the behavior:

  • in pyloos run the following:
import loos
from loos import pyloos
from pathlib import Path
import random

length_cutoff = -1  # Play with this to test how many frames you can use before error
try_vtraj = False  # if true, will try to build vtraj. Otherwise, will try to build a list of trajs.

conf_paths = Path('path/to/confs_dir').rglob('*.pdb')
samples_conf_list  = list(Map(str, conf_paths))[:length_cutoff]
model = loos.createSystem(samples_conf_list[0])
random.shuffle(samples_conf_list)
if try_vtraj:
    # if the input list is too long, this line throws the error.
    vtraj = pyloos.VirtualTrajectory(*(pyloos.Trajectory(fn, model) for fn in samples_conf_list))
else:
    # Interestingly, the crash is actually coming from list creation as an intermediate step. This line crashes with same error
    list_of_trajs = [pyloos.Trajectory(fn, model) for fn in sampls_conf_list)]

Expected behavior

I expected this mode of vtraj creation not to be sensitive to the number of trajectory objects input. If there needs to be some kind of limit here for some fundamental reason I don't understand, I'd have expected a more suggestive error message.

LOOS version and platform

These are all conda-forge loos versions on various linuxes, one is a RHEL cluster, then a kubuntu 22.04 workstation and a 22.04 laptop.

Additional context

Happy to upload files if those are needed, but I don't think which pdbs one does this with matter. One thing that may matter is the size of the PDBs. I've not checked this particular idea yet, though I will say if I provide a subset selection to the pyloos.Trajectory init I do not see a change in the number of frames I can read without error.

@tromo
Copy link
Contributor

tromo commented Jun 12, 2023

There is no limit imposed by LOOS. VirtualTrajectory is rather naive and instantiates a Trajectory object for every input trajectory and stores them, so you will have open files hanging around for the life of the vtraj. This means you're bumping into the limit of open files for your system (I hit it at 253 trajectories on my mac). A more subtle issue here, as your debugging code points out, is that LOOS trajectories will almost always keep their contained files open, so you can bump into this limit in your own code even if you don't use VirtualTrajectory.

The simplest fix is going to be to raise the per-process limit on the machines where this is a problem.

@lgsmith
Copy link
Contributor Author

lgsmith commented Jun 12, 2023 via email

@tromo
Copy link
Contributor

tromo commented Jun 12, 2023

That's a watch-specific setting, I believe. You need to adjust the max number of open file descriptors. I think you need to use something like sysctl to set it, but that may vary by distro and you may actually need to twiddle multiple settings (i.e. sysctl and ulimit).

Yes, you'll run into the same issue with native python objects, e.g.


files = []
while True:
    try:
        f = open('aligned.pdb')
    except Exception as e:
        print(f"Read {len(files)} files, exception = {e}")
        break
    files.append(f)

Running it, I get,

Read 253 files, exception = [Errno 24] Too many open files: 'aligned.pdb'

I think generators are an interesting idea, but the problem is that trajectories in LOOS are random-access. You'd only be supporting forward iteration this way. The downside is the Python code looks less like the C++ code, but the upside is that it's more pythonic. So, something to consider.

Interestingly enough, I hadn't realized that in the C++ code, there is no Trajectory.close(). We rely on the destructor taking care of everything. We may also want to consider adding an explicit close() function...

Really, the "LOOS way" would be to use subsetter and pre-process your trajectories. Parsing PDBs is expensive. Is converting to a DCD or XTC not an option for you?

@tromo
Copy link
Contributor

tromo commented Jun 12, 2023

BTW, fair point about the error message. That should be a "to do" item for us...go through the code, identify top-level failure points (mostly file opens), and try to throw a meaningful message...

@lgsmith
Copy link
Contributor Author

lgsmith commented Jul 27, 2023

I've left this open on the basis that it serves as a reminder to see if the error message can be improved. I agree that there's no fundamental issue with LOOS's code. LMK if you'd like some adjustment to be made to the issue, or if you want it closed.

@agrossfield
Copy link
Member

I went edited some of the more recent file format classes (MMCIF and MDTraj as both a system file and trajectory file) and added catches on failed opens. When there's time, I'll do more.

@agrossfield
Copy link
Member

Commit 3286fc6 adds some more error checking to the main Trajectory file opening stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants