Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout always 14 seconds when MPIEXEC_TIMEOUT set to any value #12837

Open
minrk opened this issue Oct 1, 2024 · 1 comment
Open

timeout always 14 seconds when MPIEXEC_TIMEOUT set to any value #12837

minrk opened this issue Oct 1, 2024 · 1 comment

Comments

@minrk
Copy link
Contributor

minrk commented Oct 1, 2024

Please submit all the information below so that we can understand the working environment that is the context for your question.

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from conda-forge, build is:

./configure --prefix=$PREFIX \
            --disable-dependency-tracking \
            --disable-wrapper-runpath \
            --enable-mpi-fortran \
            --with-mpi-moduledir='${includedir}' \
            --with-sge \
            --with-hwloc=$PREFIX \
            --with-libevent=$PREFIX \
            --with-zlib=$PREFIX \
            --enable-mca-dso \
            --enable-ipv6

Please describe the system on which you are running

  • Operating system/version: reproduces with macOS 14.6.1 arm and x86_64, as well as aarch64 and amd64 linux in docker
  • Computer hardware: Apple M1 Pro (ARM)
  • Network type: N/A

A conda environment can be reproduced in docker:

docker run --rm -it quay.io/condaforge/miniforge3
mamba install openmpi=5
MPIEXEC_TIMEOUT=1 mpiexec -n 2 --allow-run-as-root sleep 20

The same issue appears in debian experimental, also openmpi 5.0.5:

docker run --rm -it debian:experimental
apt update
apt -t experimental install openmpi-bin
MPIEXEC_TIMEOUT=1 mpiexec -n 2 --allow-run-as-root sleep 20

Details of the problem

This is truly strange, but when setting MPIEXEC_TIMEOUT to any value, the job timeout is actually 14 seconds:

$ MPIEXEC_TIMEOUT=1 mpiexec -n 2 sleep 20
--------------------------------------------------------------------------
The user-provided time limit for job execution has been reached:

  Timeout: 14 seconds

The job will now be aborted.  Please check your code and/or
adjust/remove the job execution time limit (as specified by --timeout
command line option or MPIEXEC_TIMEOUT environment variable).
--------------------------------------------------------------------------
$ MPIEXEC_TIMEOUT=1 mpiexec -n 2 --timeout 2 sleep 20
--------------------------------------------------------------------------
The user-provided time limit for job execution has been reached:

  Timeout: 2 seconds

The job will now be aborted.  Please check your code and/or
adjust/remove the job execution time limit (as specified by --timeout
command line option or MPIEXEC_TIMEOUT environment variable).
--------------------------------------------------------------------------

This is true for both arm and x86_64 on both mac and linux. Setting the timeout via the command-line mpiexec --timeout=1 works as expected.

openmpi 5.0.2 does not have this problem.

@minrk
Copy link
Contributor Author

minrk commented Oct 1, 2024

appears to be a regression only in the 3.0.x branch of prrte: openpmix/prrte#2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant