Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Lab does not kill spawned workers and deallocate memory when shutdown is issued to Julia kernel #1067

Open
davorh opened this issue Feb 23, 2023 · 2 comments

Comments

@davorh
Copy link

davorh commented Feb 23, 2023

Julia kernel in Jupyter Lab does not kill kernel and deallocate memory when shutdown is issued

I do not know if is this an IJulia i.e. Julia kernel issue or some interplay of Jupyter Lab and Julia kernel, but it's currently a huge productivity issue that is not related to Julia core installation. The problem was tracked down to

using Distributed

in Jupyter Lab with IJulia kernel. Detailed info:

  1. Info on architecture:

The output of versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 88 × Intel(R) Xeon(R) Gold 6238T CPU @ 1.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, cascadelake)
Threads: 1 on 88 virtual cores

  1. Installation procedure

Julia was installed by downloading

wget https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.5-linux-x86_64.tar.gz

in /opt/, unpacked and a symlinked julia -> /opt/julia-1.8.5/bin/julia.

As user, latest minicionda and Juplyter Lab was installed as described (conda-forge etc.)

In Julia, IJulia package was installed, So:

The installed version of Jupyter lab is 3.5.0, for IJulia we have "IJulia" => v"1.24.0"

  1. Test example:

Launch notebook with Julia 1.8.5 kernel,
Create memory intensive variable and workers and define variable on all of them:

using Distributed
using LinearAlgebra
N=10_000
A=rand(N,N);
addprocs(9);
@everywhere begin
   N=10_000
   A=rand(N,N);
end

after kernel shutdown is issued workers stay as active processes even if Jupyter Lab is closed.

To be more precise it kills the main kernel but processes

/opt/julia/bin/julia -Cnative -J/opt/julia/lib/julia/sys.so -g1 --color=yes --bind-to 127.0.0.1 --worker

stay alive. As I mentioned this does not happen if I run code in terminal in julia prompt. Our computations utilise around 800GB of RAM per run so this represents a huge issue.

@davorh
Copy link
Author

davorh commented Mar 1, 2023

I did some more testing on Jupyter Lab 3.5.0 and Julia 1.8.5 on Windows 10 works fine. On another Linux machine, I had Jupyter Lab 2.1.0 and Julia 1,6, and all works fine. On the same machine Jupyter Lab 3.5.0 and Julia 1.6, we have output listed at the end after test code

using Distributed
using LinearAlgebra

addprocs(9);
@everywhere begin
   N=10_000
   A=rand(N,N);
end

was executed and kernel shutdown was issued (all workers were shutdown correctly):

      From worker 5:	fatal: error thrown and no exception handler available.
      From worker 5:	InterruptException()
      From worker 5:	jl_mutex_unlock at /opt/julia/src/locks.h:134 [inlined]
      From worker 5:	jl_task_get_next at /opt/julia/src/partr.c:475
      From worker 7:	fatal: error thrown and no exception handler available.
      From worker 7:	InterruptException()
      From worker 7:	jl_mutex_unlock at /opt/julia/src/locks.h:134 [inlined]
      From worker 7:	jl_task_get_next at /opt/julia/src/partr.c:475
      From worker 10:	fatal: error thrown and no exception handler available.
      From worker 10:	InterruptException()
      From worker 10:	jl_mutex_unlock at /opt/julia/src/locks.h:134 [inlined]
      From worker 10:	jl_task_get_next at /opt/julia/src/partr.c:475
      From worker 9:	fatal: error thrown and no exception handler available.
      From worker 9:	InterruptException()
      From worker 9:	jl_mutex_unlock at /opt/julia/src/locks.h:134 [inlined]
      From worker 9:	jl_task_get_next at /opt/julia/src/partr.c:475
      From worker 6:	fatal: error thrown and no exception handler available.
      From worker 6:	InterruptException()
      From worker 6:	jl_mutex_unlock at /opt/julia/src/locks.h:134 [inlined]
      From worker 6:	jl_task_get_next at /opt/julia/src/partr.c:475
      From worker 8:	fatal: error thrown and no exception handler available.
      From worker 8:	InterruptException()
      From worker 8:	jl_mutex_unlock at /opt/julia/src/locks.h:134 [inlined]
      From worker 8:	jl_task_get_next at /opt/julia/src/partr.c:475
      From worker 10:	poptask at ./task.jl:755
      From worker 5:	poptask at ./task.jl:755
      From worker 6:	poptask at ./task.jl:755
      From worker 9:	poptask at ./task.jl:755
      From worker 7:	poptask at ./task.jl:755
      From worker 8:	poptask at ./task.jl:755
      From worker 10:	wait at ./task.jl:763 [inlined]
      From worker 10:	task_done_hook at ./task.jl:489
      From worker 9:	wait at ./task.jl:763 [inlined]
      From worker 9:	task_done_hook at ./task.jl:489
      From worker 7:	wait at ./task.jl:763 [inlined]
      From worker 7:	task_done_hook at ./task.jl:489
      From worker 6:	wait at ./task.jl:763 [inlined]
      From worker 6:	task_done_hook at ./task.jl:489
      From worker 5:	wait at ./task.jl:763 [inlined]
      From worker 5:	task_done_hook at ./task.jl:489
      From worker 8:	wait at ./task.jl:763 [inlined]
      From worker 8:	task_done_hook at ./task.jl:489
      From worker 10:	_jl_invoke at /opt/julia/src/gf.c:2237 [inlined]
      From worker 10:	jl_apply_generic at /opt/julia/src/gf.c:2419
      From worker 5:	_jl_invoke at /opt/julia/src/gf.c:2237 [inlined]
      From worker 5:	jl_apply_generic at /opt/julia/src/gf.c:2419
      From worker 9:	_jl_invoke at /opt/julia/src/gf.c:2237 [inlined]
      From worker 9:	jl_apply_generic at /opt/julia/src/gf.c:2419
      From worker 7:	_jl_invoke at /opt/julia/src/gf.c:2237 [inlined]
      From worker 7:	jl_apply_generic at /opt/julia/src/gf.c:2419
      From worker 6:	_jl_invoke at /opt/julia/src/gf.c:2237 [inlined]
      From worker 6:	jl_apply_generic at /opt/julia/src/gf.c:2419
      From worker 10:	jl_apply at /opt/julia/src/julia.h:1703 [inlined]
      From worker 10:	jl_finish_task at /opt/julia/src/task.c:208
      From worker 8:	_jl_invoke at /opt/julia/src/gf.c:2237 [inlined]
      From worker 8:	jl_apply_generic at /opt/julia/src/gf.c:2419
      From worker 10:	start_task at /opt/julia/src/task.c:850
      From worker 10:	unknown function (ip: (nil))
      From worker 5:	jl_apply at /opt/julia/src/julia.h:1703 [inlined]
      From worker 5:	jl_finish_task at /opt/julia/src/task.c:208
      From worker 9:	jl_apply at /opt/julia/src/julia.h:1703 [inlined]
      From worker 9:	jl_finish_task at /opt/julia/src/task.c:208
      From worker 6:	jl_apply at /opt/julia/src/julia.h:1703 [inlined]
      From worker 6:	jl_finish_task at /opt/julia/src/task.c:208
      From worker 7:	jl_apply at /opt/julia/src/julia.h:1703 [inlined]
      From worker 7:	jl_finish_task at /opt/julia/src/task.c:208
      From worker 9:	start_task at /opt/julia/src/task.c:850
      From worker 9:	unknown function (ip: (nil))
      From worker 5:	start_task at /opt/julia/src/task.c:850
      From worker 5:	unknown function (ip: (nil))
      From worker 7:	start_task at /opt/julia/src/task.c:850
      From worker 7:	unknown function (ip: (nil))
      From worker 6:	start_task at /opt/julia/src/task.c:850
      From worker 6:	unknown function (ip: (nil))
      From worker 8:	jl_apply at /opt/julia/src/julia.h:1703 [inlined]
      From worker 8:	jl_finish_task at /opt/julia/src/task.c:208
      From worker 8:	start_task at /opt/julia/src/task.c:850
      From worker 8:	unknown function (ip: (nil))
      From worker 3:	InterruptException:
      From worker 3:	Stacktrace:

@davorh davorh changed the title Julia kernel in Jupyter Lab does not kill kernel and deallocate memory when shutdown is issued Jupyter Lab does not kill spawned workers and deallocate memory when shutdown is issued to Julia kernel Mar 1, 2023
@sprig
Copy link

sprig commented Jun 12, 2024

I experience this as well; Running inside a container based on jupyter/datascience-notebook:julia-1.9.3

inside jupyter:

> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, broadwell)
  Threads: 2 on 8 virtual cores
Environment:
  JULIA_PKGDIR = /opt/julia
  JULIA_DEPOT_PATH = /opt/julia
$ jupyter lab version
4.0.7
$ python3 --version
Python 3.11.5

MWE:

using Distributed
addprocs(Sys.CPU_THREADS-2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants