Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raising Environment Error on Single Node Clusters #40

Merged
merged 4 commits into from
Jun 3, 2024

Conversation

benrutter
Copy link
Contributor

This addresses #39 - would appreciate some feedback as I'm not sure if this is a great idea or not.

Currently, if you start client = dask_databricks.client() on a single node databricks cluster, it'll allow you to, but then hang indefinitely if you try to carry out any tasks (makes sense since there are no workers to do them).

My thinking is it would be handy to throw an error if we can since if we're on a single-node cluster we already know that any attempted use of the dask_databricks client is going to end badly.

I wanted to implement this a little like this:

if os.getenv("NUMNODES") == "1":
    raise EnvironmentError(
        "You appear to be running dask-databricks on a "
        "single-node cluster. Dask requires at least one worker node "
        "in order to function as expected."
    )

But databricks doesn't appear to have an environment variable like "NUMNODES" - it does have a "MASTER" environment variable, which for single node clusters tells you the core working as the driver (such as local[4]) and in distributed clusters gives you the tcp url.

I've used that to implement it like this:

if os.getenv("MASTER") and "local[" in os.getenv("MASTER"):

My thinking is that this is a pretty conservative check (owing to the fact that databricks could hypothetically change this implementation detail at some point, and I can't find any documentation of it). If there is no MASTER variable, nothing will get raised, so the only possible false positive is if local[4] is set as MASTER in a multi-node cluster which I don't think is possible?

It's still a little hackier than I'd of liked though, but does boost the user-friendliness for beginners. Any thoughts?

Copy link
Collaborator

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for raising this. The hackiness is a shame, but I agree it's better than the alternative experience. I left a couple of small comments.

dask_databricks/databrickscluster.py Outdated Show resolved Hide resolved
dask_databricks/databrickscluster.py Outdated Show resolved Hide resolved
@benrutter
Copy link
Contributor Author

Thanks for the review!

@jacobtomlinson jacobtomlinson merged commit df8be09 into dask-contrib:main Jun 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants