Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dwalk OOM with big directories #46

Open
xk42 opened this issue Mar 13, 2024 · 1 comment
Open

dwalk OOM with big directories #46

xk42 opened this issue Mar 13, 2024 · 1 comment

Comments

@xk42
Copy link

xk42 commented Mar 13, 2024

Tried both local install and singularity. Both runs out of memory at the end of dwalk on a directory with 130m files. The server has 32 cores and 384G of memory.

Any suggestions? Or I'll just need to break them up into smaller batches.

[2024-03-13T14:14:17] Walked 137236135 items in 4453.108998 secs (30818.049832 items/sec) ...
[2024-03-13T14:14:18] Walked 137241833 items in 4453.868931 secs (30814.070893 items/sec) ...
[2024-03-13T14:14:20] Walked 137241833 items in 4456.083929 seconds (30798.754062 items/sec)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pi3dc5-003 exited on signal 9 (Killed).
--------------------------------------------------------------------------
ERROR:root:Problem running: ['mpirun', '--oversubscribe', '-np', '12', '/archivetar/install/bin/dwalk', '--sort', 'name', '--distribution', 'size:0,1K,1M,10M,100M,1G,10G,100G,1T', '--progress', '10', '--output', '/scratch/CRISPRCasFinder-pi3dc5-003-singularity-2024-03-13-13-00-03.cache', '.'] and Command '['mpirun', '--oversubscribe', '-np', '12', '/archivetar/install/bin/dwalk', '--sort', 'name', '--distribution', 'size:0,1K,1M,10M,100M,1G,10G,100G,1T', '--progress', '10', '--output', '/scratch/CRISPRCasFinder-pi3dc5-003-singularity-2024-03-13-13-00-03.cache', '.']' returned non-zero exit status 137.
Traceback (most recent call last):
  File "mpiFileUtils/__init__.py", line 45, in apply
  File "subprocess.py", line 526, in run
subprocess.CalledProcessError: Command '['mpirun', '--oversubscribe', '-np', '12', '/archivetar/install/bin/dwalk', '--sort', 'name', '--distribution', 'size:0,1K,1M,10M,100M,1G,10G,100G,1T', '--progress', '10', '--output', '/scratch/CRISPRCasFinder-pi3dc5-003-singularity-2024-03-13-13-00-03.cache', '.']' returned non-zero exit status 137.
Traceback (most recent call last):
  File "mpiFileUtils/__init__.py", line 45, in apply
  File "subprocess.py", line 526, in run
subprocess.CalledProcessError: Command '['mpirun', '--oversubscribe', '-np', '12', '/archivetar/install/bin/dwalk', '--sort', 'name', '--distribution', 'size:0,1K,1M,10M,100M,1G,10G,100G,1T', '--progress', '10', '--output', '/scratch/CRISPRCasFinder-pi3dc5-003-singularity-2024-03-13-13-00-03.cache', '.']' returned non-zero exit status 137.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "archivetar.py", line 21, in <module>
  File "archivetar/__init__.py", line 460, in main
  File "archivetar/__init__.py", line 214, in build_list
  File "mpiFileUtils/__init__.py", line 122, in scanpath
  File "mpiFileUtils/__init__.py", line 48, in apply
mpiFileUtils.exceptions.mpiFileUtilsError: Problems Command '['mpirun', '--oversubscribe', '-np', '12', '/archivetar/install/bin/dwalk', '--sort', 'name', '--distribution', 'size:0,1K,1M,10M,100M,1G,10G,100G,1T', '--progress', '10', '--output', '/scratch/CRISPRCasFinder-pi3dc5-003-singularity-2024-03-13-13-00-03.cache', '.']' returned non-zero exit status 137.
[193100] Failed to execute script 'archivetar' due to unhandled exception!

@brockpalen
Copy link
Owner

Correct the first part based on dwalk keeps the full file tree in memory and thus very large targets OOM the system.

In theory (never tested) you could adjust it so you can use dwalk across multiple nodes to use memory from multiple nodes, but is a waste of resources otherwise.

The recommendation is to break up into smaller targets,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants