Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.0.18] Issue running the command mila code: socket.gaierror: [Errno -3] Temporary failure in name resolution #60

Open
geronimocharlie opened this issue Oct 5, 2023 · 0 comments

Comments

@geronimocharlie
Copy link

geronimocharlie commented Oct 5, 2023

Make sure you can reproduce the issue with the latest version available

 pip install milatools --upgrade
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: milatools in ./.local/lib/python3.10/site-packages (0.0.18)
Requirement already satisfied: Fabric<3.0.0,>=2.7.0 in ./.local/lib/python3.10/site-packages (from milatools) (2.7.1)
Requirement already satisfied: blessed<2.0.0,>=1.18.1 in ./.local/lib/python3.10/site-packages (from milatools) (1.20.0)
Requirement already satisfied: coleo<0.4.0,>=0.3.0 in ./.local/lib/python3.10/site-packages (from milatools) (0.3.2)
Requirement already satisfied: questionary<2.0.0,>=1.10.0 in ./.local/lib/python3.10/site-packages (from milatools) (1.10.0)
Requirement already satisfied: sshconf<0.3.0,>=0.2.2 in ./.local/lib/python3.10/site-packages (from milatools) (0.2.5)
Requirement already satisfied: wcwidth>=0.1.4 in ./.local/lib/python3.10/site-packages (from blessed<2.0.0,>=1.18.1->milatools) (0.2.6)
Requirement already satisfied: six>=1.9.0 in /usr/lib/python3/dist-packages (from blessed<2.0.0,>=1.18.1->milatools) (1.16.0)
Requirement already satisfied: ptera<2.0.0,>=1.4.1 in ./.local/lib/python3.10/site-packages (from coleo<0.4.0,>=0.3.0->milatools) (1.4.1)
Requirement already satisfied: invoke<2.0,>=1.3 in ./.local/lib/python3.10/site-packages (from Fabric<3.0.0,>=2.7.0->milatools) (1.7.3)
Requirement already satisfied: paramiko>=2.4 in /usr/lib/python3/dist-packages (from Fabric<3.0.0,>=2.7.0->milatools) (2.9.3)
Requirement already satisfied: pathlib2 in ./.local/lib/python3.10/site-packages (from Fabric<3.0.0,>=2.7.0->milatools) (2.3.7.post1)
Requirement already satisfied: prompt_toolkit<4.0,>=2.0 in ./.local/lib/python3.10/site-packages (from questionary<2.0.0,>=1.10.0->milatools) (3.0.39)
Requirement already satisfied: codefind<0.2.0,>=0.1.2 in ./.local/lib/python3.10/site-packages (from ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (0.1.3)
Requirement already satisfied: giving<0.5.0,>=0.4.1 in ./.local/lib/python3.10/site-packages (from ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (0.4.2)
Requirement already satisfied: asttokens<3.0.0,>=2.2.1 in ./.local/lib/python3.10/site-packages (from giving<0.5.0,>=0.4.1->ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (2.4.0)
Requirement already satisfied: reactivex<5.0.0,>=4.0.0 in ./.local/lib/python3.10/site-packages (from giving<0.5.0,>=0.4.1->ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (4.0.4)
Requirement already satisfied: varname<0.11.0,>=0.10.0 in ./.local/lib/python3.10/site-packages (from giving<0.5.0,>=0.4.1->ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (0.10.0)
Requirement already satisfied: typing-extensions<5.0.0,>=4.1.1 in ./.local/lib/python3.10/site-packages (from reactivex<5.0.0,>=4.0.0->giving<0.5.0,>=0.4.1->ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (4.7.1)
Requirement already satisfied: executing<2.0,>=1.1 in ./.local/lib/python3.10/site-packages (from varname<0.11.0,>=0.10.0->giving<0.5.0,>=0.4.1->ptera<2.0.0,>=1.4.1->coleo<0.4.0,>=0.3.0->milatools) (1.2.0)

What command did you run?

 mila code /home/mila/c/charlotte.lange/scratch/neurips23/causalpaca --job 3703232

Describe the bug

Cannot access interactive job with mila code. Traceback:

(mila) $ squeue --jobs 3703232 -ho %N
cn-a010
Traceback (most recent call last):
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/milatools/cli/commands.py", line 43, in main
    auto_cli(milatools)
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/coleo/cli.py", line 656, in auto_cli
    result = run_cli(entry, args, **kwargs)
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/coleo/cli.py", line 628, in run_cli
    return call(opts=opts, args=args)
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/coleo/cli.py", line 587, in thunk
    result = fn(*args)
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/milatools/cli/commands.py", line 288, in code
    cnode = _find_allocation(remote, job_name="mila-code")
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/milatools/cli/commands.py", line 703, in _find_allocation
    return Remote(node_name)
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/milatools/cli/remote.py", line 84, in __init__
    connection.open()
  File "/home/fortheswarm/.local/lib/python3.10/site-packages/fabric/connection.py", line 636, in open
    self.client.connect(**kwargs)
  File "/usr/lib/python3/dist-packages/paramiko/client.py", line 340, in connect
    to_try = list(self._families_and_addresses(hostname, port))
  File "/usr/lib/python3/dist-packages/paramiko/client.py", line 203, in _families_and_addresses
    addrinfos = socket.getaddrinfo(
  File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

An error occured during the execution of the command `code`. Please try updating milatools by running
  pip install milatools --upgrade
in the terminal. If the issue persists, consider filling a bug report at
  https://github.com/mila-iqia/milatools/issues/new?labels=code%2C0.0.18&template=bug_report.md&title=%5Bv0.0.18%5D+Issue+running+the+command+%60mila+code%60

Screenshots

image

Desktop (please complete the following information):

Ubuntu 22.04.3 LTS 64bit GNOM 42.9

Additional Context

interactive job started with:


salloc --time=4:0:0  --gres=gpu:1 --mem=24G -c 1
salloc: --------------------------------------------------------------------------------------------------
salloc: # Using default long partition
salloc: --------------------------------------------------------------------------------------------------
salloc: Granted job allocation 3703232
salloc: Waiting for resource configuration
salloc: Nodes cn-a010 are ready for job

@lebrice lebrice changed the title [v0.0.18] Issue running the command mila code [v0.0.18] Issue running the command mila code: socket.gaierror: [Errno -3] Temporary failure in name resolution Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant