Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nomad] Constraint did not meet topology requirement #771

Closed
GiyoMoon opened this issue Nov 7, 2024 · 8 comments
Closed

[nomad] Constraint did not meet topology requirement #771

GiyoMoon opened this issue Nov 7, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@GiyoMoon
Copy link

GiyoMoon commented Nov 7, 2024

TL;DR

I followed the nomad guide and when trying to add a job that makes use of the volumes I get the error "Constraint did not meet topology requirement filtered 1 node".

Expected behavior

The job should be able to use the volume. Maybe it's a bug with the csi driver or the nomad guide is missing something.

Observed behavior

Any jobs that want to make use of a volume can't be created due to constraints not meeting topology requirements.

Minimal working example

Follow the nomad guide 1:1 with the latest csi driver version (v2.10.0) and you should encounter this error.

Log output

+ Job: "volume-name"
+ Task Group: "group-name" (1 create)
  + Task: "task-name" (forces create)

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "group-name" (failed to place 1 allocation):
    * Constraint "did not meet topology requirement": 1 nodes excluded by filter

Job Modify Index: 0
To submit the job with version verification run:

nomad job run -check-index 0 test.hcl

When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

Additional information

nomad - 1.9.1
hcloud-csi-driver - v2.10.0
@GiyoMoon GiyoMoon added the bug Something isn't working label Nov 7, 2024
@GiyoMoon
Copy link
Author

GiyoMoon commented Nov 7, 2024

When printing the volume information, I get:

ID                   = db-vol
Name                 = db-vol
Namespace            = default
External ID          = REDACTED
Plugin ID            = csi.hetzner.cloud
Provider             = csi.hetzner.cloud
Version              = 2.10.0
Capacity             = 10 GiB
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = fs_type: ext4 flags: [REDACTED]
Namespace            = default

Topologies
Topology  Segments
01        csi.hetzner.cloud/location=nbg1

Allocations
No allocations placed

Maybe my node needs be assigned to the nbg1 location somehow to meet the constraint? If yes I don't know how to add this, any help would be appreciated 😇 Thanks!

@resmo
Copy link
Contributor

resmo commented Nov 10, 2024

I am the author of the doc and can confirm this issue after updating to csi 2.10.0. A fallback to 2.9.0 resolved the issue for me.

@GiyoMoon could you try using 2.9.0?

@lopcode
Copy link

lopcode commented Nov 10, 2024

Bumped in to the same issue today, confirmed downgrading to 2.9.0 worked 🤞

@resmo
Copy link
Contributor

resmo commented Nov 10, 2024

I briefly run over the commits and the changes in this PR look suspicious to me: #743

Pinging @apricote to let him know.

Update:
I tried out some things, like adding a custom meta tag, without success...

client {
  enabled = true
  node_class = "client"

  meta {
    "instance.hetzner.cloud/provided-by" = "cloud"
  }

...

@GiyoMoon
Copy link
Author

@resmo Hey, thanks for looking into this! I can confirm that 2.9.0 works for me too.

@lukasmetzner
Copy link
Contributor

Hi,

we are in the process of building a Nomad development setup to run csi-driver end-to-end tests on Nomad. During this setup, we encountered this bug. We are currently investigating the issue.

Best Regards
Lukas

lukasmetzner pushed a commit that referenced this issue Nov 12, 2024
### ⚠️ Removed Feature from v2.10.0

We have reverted a workaround for an upstream issue in the Kubernetes
scheduler where nodes without the CSI Plugin (e.g. Robot servers) would
still be considered for scheduling, but then creating and attaching the
volume fails with no automatic reconciliation of the this error.

Due to variations in the CSI specification implementation, these changes
disrupted Nomad clusters, requiring us to revert them. We are actively
working on placing this workaround behind a feature flag, allowing
Kubernetes users to bypass the upstream issue.

This affects you, if you have set the Helm value
`allowedTopologyCloudServer` in v2.10.0. If you are affected by the
Kubernetes upstream issue, we will provide a fix in the next minor
version v2.11.0.

Learn more about this in
[#400](#400) and
[#771](#771).

### Bug Fixes

- reverted NodeGetInfo response as it breaks Nomad clusters (#776)

Co-authored-by: releaser-pleaser <>
@lukasmetzner
Copy link
Contributor

Hi,

we have just released v2.10.1 to revert the breaking changes.

We apologize for any inconvenience this may have caused.

Best regards,
Lukas

@GiyoMoon
Copy link
Author

Hey Lukas, thanks! Can confirm that 2.10.1 works :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants