-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative worker commands, config options #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, excited to give it a spin.
One quick thought is what happens if the options the user wants to provide contains spaces?
$ dask databricks run --worker-args "--foo 'bar baz'"
The above example wouldn't split up cleanly. I wonder if we also want to add optional JSON support. So before calling worker_args.split()
we try and call json.loads(worker_args)
.
That way a user could specify a JSON list of arguments if they want to be explicit.
$ dask databricks run --worker-args "['--foo' 'bar baz']"
for more information, see https://pre-commit.ci
Tested my changes with the following script:
Seeing this error on the Scheduler/Driver node: Running command git clone --filter=blob:none --quiet https://github.com/skirui-source/dask-databricks.git /tmp/pip-req-build-pixm6hlg
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
petastorm 0.12.1 requires pyspark>=2.1.0, which is not installed.
databricks-feature-store 0.14.1 requires pyspark<4,>=3.1.2, which is not installed.
ydata-profiling 4.2.0 requires numpy<1.24,>=1.16.0, but you have numpy 1.26.1 which is incompatible.
scipy 1.9.1 requires numpy<1.25.0,>=1.18.5, but you have numpy 1.26.1 which is incompatible.
mleap 0.20.0 requires scikit-learn<0.23.0,>=0.22.0, but you have scikit-learn 1.1.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imbalanced-learn 0.10.1 requires scikit-learn>=1.0.2, but you have scikit-learn 0.22.1 which is incompatible.
/databricks/python3/lib/python3.10/site-packages/dask/cli.py:100: UserWarning: While registering the command with name 'cuda', an exception ocurred; 'function' object has no attribute 'command'.
warnings.warn(
/databricks/python3/lib/python3.10/site-packages/dask/cli.py:100: UserWarning: While registering the command with name 'cuda', an exception ocurred; 'function' object has no attribute 'command'.
warnings.warn(
2023-11-09 07:32:18,796 - distributed.scheduler - INFO - -----------------------------------------------
2023-11-09 07:32:19,132 - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2023-11-09 07:32:19,168 - distributed.scheduler - INFO - State start
2023-11-09 07:32:19,174 - distributed.scheduler - INFO - -----------------------------------------------
2023-11-09 07:32:19,174 - distributed.scheduler - INFO - Scheduler at: tcp://10.59.230.165:8786
2023-11-09 07:32:19,175 - distributed.scheduler - INFO - dashboard at: http://10.59.230.165:8787/status
2023-11-09 07:32:19,175 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2023-11-09 07:32:19,875 - distributed.comm.tcp - INFO - Connection from tcp://10.59.241.62:56834 closed before handshake completed
2023-11-09 07:32:19,877 - distributed.comm.tcp - INFO - Connection from tcp://10.59.249.7:34864 closed before handshake completed
2023-11-09 07:34:32,415 - distributed.scheduler - INFO - Receive client connection: Client-6c879bb7-7ed2-11ee-8e0b-00163e5e434a
2023-11-09 07:34:32,417 - distributed.core - INFO - Starting established connection to tcp://10.59.230.165:56762 |
and this error...from the dask worker: Running command git clone --filter=blob:none --quiet https://github.com/skirui-source/dask-databricks.git /tmp/pip-req-build-vvt7ougt
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
petastorm 0.12.1 requires pyspark>=2.1.0, which is not installed.
databricks-feature-store 0.14.1 requires pyspark<4,>=3.1.2, which is not installed.
ydata-profiling 4.2.0 requires numpy<1.24,>=1.16.0, but you have numpy 1.26.1 which is incompatible.
scipy 1.9.1 requires numpy<1.25.0,>=1.18.5, but you have numpy 1.26.1 which is incompatible.
mleap 0.20.0 requires scikit-learn<0.23.0,>=0.22.0, but you have scikit-learn 1.1.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imbalanced-learn 0.10.1 requires scikit-learn>=1.0.2, but you have scikit-learn 0.22.1 which is incompatible.
/databricks/python3/lib/python3.10/site-packages/dask/cli.py:100: UserWarning: While registering the command with name 'cuda', an exception ocurred; 'function' object has no attribute 'command'.
warnings.warn(
/databricks/python3/lib/python3.10/site-packages/dask/cli.py:100: UserWarning: While registering the command with name 'cuda', an exception ocurred; 'function' object has no attribute 'command'.
warnings.warn(
Usage: dask [OPTIONS] COMMAND [ARGS]...
Try 'dask -h' for help.
Error: No such command 'cuda'. |
I think the core of the problem is in this line
I reproduced the error in a Databricks notebook to get the full traceback import importlib_metadata
[ep] = [ep for ep in importlib_metadata.entry_points(group="dask_cli") if ep.name == "cuda"]
ep.load() AttributeError: 'function' object has no attribute 'command'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File <command-1479567383531443>, line 3
1 import importlib_metadata
2 [ep] = [ep for ep in importlib_metadata.entry_points(group="dask_cli") if ep.name == "cuda"]
----> 3 ep.load()
File /databricks/python/lib/python3.10/site-packages/importlib_metadata/__init__.py:209, in EntryPoint.load(self)
204 """Load the entry point from its definition. If only a module
205 is indicated by the value, return that module. Otherwise,
206 return the named object.
207 """
208 match = self.pattern.match(self.value)
--> 209 module = import_module(match.group('module'))
210 attrs = filter(None, (match.group('attr') or '').split('.'))
211 return functools.reduce(getattr, attrs, module)
File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)
File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)
File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)
File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)
File /databricks/python/lib/python3.10/site-packages/dask_cuda/cli.py:61
56 @click.group
57 def cuda():
58 """Subcommands to launch or query distributed workers with GPUs."""
---> 61 @cuda.command(name="worker", context_settings=dict(ignore_unknown_options=True))
62 @scheduler
63 @preload_argv
64 @click.option(
65 "--host",
66 type=str,
67 default=None,
68 help="""IP address of serving host; should be visible to the scheduler and other
69 workers. Can be a string (like ``"127.0.0.1"``) or ``None`` to fall back on the
70 address of the interface specified by ``--interface`` or the default interface.""",
71 )
(...)
322 def worker(
(...)
357 **kwargs,
358 ):
359 """Launch a distributed worker with GPUs attached to an existing scheduler.
360
361 A scheduler can be specified either through a URI passed through the ``SCHEDULER``
(...)
366 for info.
367 """
368 if multiprocessing_method == "forkserver":
AttributeError: 'function' object has no attribute 'command' I can also reproduce this by importing the submodule. >>> import dask_cuda.cli
AttributeError: 'function' object has no attribute 'command' |
… skirui-source/main
Looks like Databricks gives us |
Ok I got things working! I pushed a couple of extra commits to this PR but ultimately reverted one. The main change I've made is to bump the minimum version of I used Then I used this init script which does make a couple of small tweaks due to using the #!/bin/bash
set -e
# The Databricks Python directory isn't on the path in
# databricksruntime/gpu-tensorflow:cuda11.8 for some reason
export PATH="/databricks/python/bin:$PATH"
# Install git just so that we can install dask-databricks from source
# as it's not included in databricksruntime/gpu-tensorflow:cuda11.8.
# We can remove this when installing dask-databricks from PyPI.
apt-get update && apt-get install git -y
# Install RAPIDS (cudf & dask-cudf) and dask-databricks
/databricks/python/bin/pip install --extra-index-url=https://pypi.nvidia.com \
bokeh==3.2.2 \
cudf-cu11 \
dask[complete] \
dask-cudf-cu11 \
dask-cuda \
git+https://github.com/skirui-source/dask-databricks.git@main
# Start the Dask cluster with CUDA workers
dask databricks run --worker-command "dask cuda worker" |
I cleaned things up a little further including adding a dask databricks run --cuda
# is equivalent to
dask databricks run --worker-command "dask cuda worker" I also noticed that not pinning So things work nicely now with this init script. #!/bin/bash
set -e
# The Databricks Python directory isn't on the path in
# databricksruntime/gpu-tensorflow:cuda11.8 for some reason
export PATH="/databricks/python/bin:$PATH"
# Install git just so that we can install dask-databricks from source
# as it's not included in databricksruntime/gpu-tensorflow:cuda11.8.
# We can remove this when installing dask-databricks from PyPI.
apt-get update && apt-get install git -y
# Install RAPIDS (cudf & dask-cudf) and dask-databricks
/databricks/python/bin/pip install --extra-index-url=https://pypi.nvidia.com \
bokeh==3.2.2 \
cudf-cu11 \
dask[complete] \
dask-cudf-cu11 \
dask-cuda==23.10.0 \
git+https://github.com/skirui-source/dask-databricks.git@main
# Start the Dask cluster with CUDA workers
dask databricks run --cuda |
Closes #10