azuredisk-node-win fails to mount disk: requested access path is already in use #2690

ps610 · 2024-12-04T09:40:51Z

What happened:
We have a cluster with several Windows nodes (2022), on which Windows pods are executed depending on the demand of our users. The Windows (application) pods are Microsoft Business Central Containers with at least one volume (PVC) containing the application's database. (see base/example helm chart)

In times with high demand and therefore the parallel start of many pods, it happens sporadically that the pod's volume cannot be mounted, which means that the pod cannot start and remains in the “Containercreating” state. As a workaround, the “stuck” pod can be deleted manually and it will then work for an automatically recreated pod (based on the deployment).

The error first appeared under Kubernetes version 1.28.5, we upgraded via 1.29.9 to 1.30.5 yesterday in the hope that this would fix the problem. But in fact it seems to occur more frequently unfortunately, as in the past roughly 2% of starting pods were affected, but this morning almost 10%.

Error from csi-azuredisk-node-win log:

I1204 07:21:16.053429    7932 nodeserver.go:157] NodeStageVolume: formatting 7 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\disk.csi.azure.com\3b03e9b721efa805aa50589f1531a282237faef0f18d6d7d05f21d77c63faf9d\globalmount with mount options([])
I1204 07:21:20.790333    7932 disk.go:363] Disk 7 already initialized
I1204 07:21:22.120014    7932 disk.go:380] Disk 7 already partitioned
E1204 07:21:26.970044    7932 utils.go:110] GRPC error: rpc error: code = Internal desc = could not format 7(lun: 6), and mount it at \var\lib\kubelet\plugins\kubernetes.io\csi\disk.csi.azure.com\3b03e9b721efa805aa50589f1531a282237faef0f18d6d7d05f21d77c63faf9d\globalmount, failed with error mount volume to path. cmd: Get-Volume -UniqueId "$Env:volumeID" | Get-Partition | Add-PartitionAccessPath -AccessPath $Env:path, output: Add-PartitionAccessPath : The requested access path is already in use.
Activity ID: {4a3c6ee8-5e15-4807-9501-645caf6e96ef}
At line:1 char:56
+ ... meID" | Get-Partition | Add-PartitionAccessPath -AccessPath $Env:path
+                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (StorageWMI:ROOT/Microsoft/.../MSFT_Partition) [Add-PartitionAccessPath 
   ], CimException
    + FullyQualifiedErrorId : StorageWMI 42002,Add-PartitionAccessPath
 
, error: exit status 1
I1204 07:25:42.746614    7932 utils.go:105] GRPC call: /csi.v1.Node/NodeStageVolume
I1204 07:25:42.746669    7932 utils.go:106] GRPC request: {"publish_context":{"LUN":"6"},"staging_target_path":"\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\disk.csi.azure.com\\6361b07d0c9f08393f72efed6acbfd87e35c6af82b85ee6fa4a7433d030eb191\\globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-4d1158ff-928f-4bc4-92ec-e8fe6c153a35","csi.storage.k8s.io/pvc/name":"f053192c13d0-business-central-db","csi.storage.k8s.io/pvc/namespace":"cust-demue-gmbh","requestedsizegib":"15","skuname":"Premium_ZRS","storage.kubernetes.io/csiProvisionerIdentity":"1731396534581-6144-disk.csi.azure.com"},"volume_id":"/subscriptions/4abe427c-4d7a-47be-be27-6631e9a2d5ad/resourceGroups/mc_cosmo-alpaca-aks_cluster_westeurope/providers/Microsoft.Compute/disks/pvc-4d1158ff-928f-4bc4-92ec-e8fe6c153a35"}

What you expected to happen:
Mounting the volume always works and the pods are able to start.

How to reproduce it:
Can't provide a reproduction scenarios as it happens very sporadically and in situations with high load.

Anything else we need to know?:
We're using autoscaling for our node pools. The application pods may automatically be removed at the end of the day (depending on the users needs) and only the volume (with the database) is kept. The next day, a new pod can/will be created with the existing volume attached. Which means for the user, it is the "same" environment.

Environment:

CSI Driver version: v1.30.5-windows-hp
Kubernetes version (use kubectl version): v1.30.5
OS (e.g. from /etc/os-release): Windows2022
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

andyzhangx · 2024-12-04T11:15:04Z

@ps610 what is your windows vm sku? is it hyperv Gen2 VM?

ps610 · 2024-12-04T12:19:50Z

@ps610 what is your windows vm sku? is it hyperv Gen2 VM?

Hi @andyzhangx,

We are running on Standard_E8ds_v5 VMs

andyzhangx · 2024-12-04T13:21:44Z

does the same disk volume mounted and unmounted on the node frequently?

you could run kubectl exec -it -n kube-system csi-azuredisk-node-win-xxx -c azuredisk -- cmd and then Check if the requested access path is already in use by running the following command in PowerShell:

(Get-Disk -Number 2 | Get-Partition | Get-Volume).UniqueId
\\?\Volume{c00607ef-8189-4e45-8e78-7b97c3d2d158}\

Get-Volume -UniqueId "\\?\Volume{c00607ef-8189-4e45-8e78-7b97c3d2d158}\" | Get-Partition

andyzhangx · 2024-12-04T14:43:33Z

nvm, this PR should fix the issue: #2691, this is the testing image: mcr.microsoft.com/k8s/csi/azuredisk-csi:v1.32.0-windows-hp which contains the fix

andyzhangx · 2024-12-05T10:07:21Z

root cause is that the first disk format process costing more than 2min, thus timeout, and then another mount process is called, so you would hit this error, I could sometimes repro in e2e tests.

I1205 08:54:47.781411    3328 utils.go:77] GRPC call: /csi.v1.Node/NodeStageVolume
I1205 08:54:47.781949    3328 utils.go:78] GRPC request: {"publish_context":{"LUN":"0"},"staging_target_path":"\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\disk.csi.azure.com\\09d7f61f6574d352182e825eb461d54e98aac258ad9005f617b2edef1bd57db2\\globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-3d12ae9e-6da8-433b-8db5-aaf16147b078","csi.storage.k8s.io/pvc/name":"pvc-qgwvr","csi.storage.k8s.io/pvc/namespace":"azuredisk-655","requestedsizegib":"10","skuName":"StandardSSD_LRS","storage.kubernetes.io/csiProvisionerIdentity":"1733386384086-4819-disk.csi.azure.com"},"volume_id":"/subscriptions/46678f10-4bbb-447e-98e8-d2829589f2d8/resourceGroups/capz-63fp7f/providers/Microsoft.Compute/disks/pvc-3d12ae9e-6da8-433b-8db5-aaf16147b078"}

I1205 08:55:14.122188    3328 nodeserver.go:157] NodeStageVolume: formatting 2 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\disk.csi.azure.com\09d7f61f6574d352182e825eb461d54e98aac258ad9005f617b2edef1bd57db2\globalmount with mount options([])

I1205 08:55:34.308557    3328 disk.go:356] Initializing disk 2
I1205 08:55:34.323338    3328 azure_disk_utils.go:863] Executing command: "C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -Mta -NoProfile -Command Get-Disk -Number 4 | Where partitionstyle -eq 'raw'"


I1205 08:55:55.165024    3328 disk.go:373] Creating basic partition on disk 2
I1205 08:55:55.173003    3328 azure_disk_utils.go:863] Executing command: "C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe -Mta -NoProfile -Command Get-Partition | Where DiskNumber -eq 4 | Where Type -ne Reserved"

I1205 08:56:47.790793    3328 nodeserver.go:161] NodeStageVolume: format 2 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\disk.csi.azure.com\09d7f61f6574d352182e825eb461d54e98aac258ad9005f617b2edef1bd57db2\globalmount successfully.


I1205 08:56:53.665838    3328 nodeserver.go:157] NodeStageVolume: formatting 2 and mounting at \var\lib\kubelet\plugins\kubernetes.io\csi\disk.csi.azure.com\09d7f61f6574d352182e825eb461d54e98aac258ad9005f617b2edef1bd57db2\globalmount with mount options([])


I1205 08:57:11.965688    3328 utils.go:77] GRPC call: /csi.v1.Node/NodeStageVolume
I1205 08:57:11.965688    3328 utils.go:78] GRPC request: {"publish_context":{"LUN":"0"},"staging_target_path":"\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\disk.csi.azure.com\\09d7f61f6574d352182e825eb461d54e98aac258ad9005f617b2edef1bd57db2\\globalmount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-3d12ae9e-6da8-433b-8db5-aaf16147b078","csi.storage.k8s.io/pvc/name":"pvc-qgwvr","csi.storage.k8s.io/pvc/namespace":"azuredisk-655","requestedsizegib":"10","skuName":"StandardSSD_LRS","storage.kubernetes.io/csiProvisionerIdentity":"1733386384086-4819-disk.csi.azure.com"},"volume_id":"/subscriptions/46678f10-4bbb-447e-98e8-d2829589f2d8/resourceGroups/capz-63fp7f/providers/Microsoft.Compute/disks/pvc-3d12ae9e-6da8-433b-8db5-aaf16147b078"}

E1205 08:57:10.860785    3328 utils.go:82] GRPC error: rpc error: code = Internal desc = could not format 2(lun: 0), and mount it at \var\lib\kubelet\plugins\kubernetes.io\csi\disk.csi.azure.com\09d7f61f6574d352182e825eb461d54e98aac258ad9005f617b2edef1bd57db2\globalmount, failed with error mount volume to path. cmd: Get-Volume -UniqueId "$Env:volumeID" | Get-Partition | Add-PartitionAccessPath -AccessPath $Env:path, output: Add-PartitionAccessPath : The requested access path is already in use.
Activity ID: {2c1daab0-451c-48eb-8168-03bdd0b97b01}
At line:1 char:56
+ ... meID" | Get-Partition | Add-PartitionAccessPath -AccessPath $Env:path
+                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (StorageWMI:ROOT/Microsoft/.../MSFT_Partition) [Add-PartitionAccessPath 
   ], CimException
    + FullyQualifiedErrorId : StorageWMI 42002,Add-PartitionAccessPath

lippertmarkus · 2024-12-05T10:56:00Z

Is that fixed with your PRs? Just asking because you reopened this issue here

andyzhangx · 2024-12-05T11:42:29Z

Is that fixed with your PRs? Just asking because you reopened this issue here

@lippertmarkus not yet, but I found how to fix it, stay tuned.

sixeyed · 2024-12-05T13:38:47Z

Adding that we have the same issue with Windows 2019 nodes on AKS 1.29. The Pods are created from a KEDA ScaledJob and they use an Azure Disk ephemeral volume. We are not re-using paths, but the node pool is using deallocate scale-down mode.
@andyzhangx - happy to test a patched image, we can reproduce this easily.

ps610 · 2024-12-09T06:53:55Z

Thank you, @andyzhangx.
We use managed AKS (currently on 1.30.5), to which version do we have to upgrade to receive your fix?

andyzhangx · 2024-12-09T07:03:09Z

Thank you, @andyzhangx. We use managed AKS (currently on 1.30.5), to which version do we have to upgrade to receive your fix?

@ps610 I will publish new csi driver version this week, pls email me your aks cluster fqdn, I will upgrade your csi driver version on Windows directly after new version release.

andyzhangx mentioned this issue Dec 4, 2024

fix: unmount volume issue on Windows node #2691

Merged

4 tasks

andyzhangx closed this as completed in #2691 Dec 5, 2024

This was referenced Dec 5, 2024

[release-1.30] fix: unmount volume issue on Windows node #2695

Merged

[release-1.29] fix: unmount volume issue on Windows node #2696

Merged

andyzhangx reopened this Dec 5, 2024

andyzhangx mentioned this issue Dec 6, 2024

fix: allow more powershell command running at same time on Windows node #2699

Merged

4 tasks

andyzhangx closed this as completed in #2699 Dec 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azuredisk-node-win fails to mount disk: requested access path is already in use #2690

azuredisk-node-win fails to mount disk: requested access path is already in use #2690

ps610 commented Dec 4, 2024

andyzhangx commented Dec 4, 2024

ps610 commented Dec 4, 2024 •

edited

Loading

andyzhangx commented Dec 4, 2024 •

edited

Loading

andyzhangx commented Dec 4, 2024 •

edited

Loading

andyzhangx commented Dec 5, 2024

lippertmarkus commented Dec 5, 2024

andyzhangx commented Dec 5, 2024

sixeyed commented Dec 5, 2024

ps610 commented Dec 9, 2024

andyzhangx commented Dec 9, 2024

azuredisk-node-win fails to mount disk: requested access path is already in use #2690

azuredisk-node-win fails to mount disk: requested access path is already in use #2690

Comments

ps610 commented Dec 4, 2024

andyzhangx commented Dec 4, 2024

ps610 commented Dec 4, 2024 • edited Loading

andyzhangx commented Dec 4, 2024 • edited Loading

andyzhangx commented Dec 4, 2024 • edited Loading

andyzhangx commented Dec 5, 2024

lippertmarkus commented Dec 5, 2024

andyzhangx commented Dec 5, 2024

sixeyed commented Dec 5, 2024

ps610 commented Dec 9, 2024

andyzhangx commented Dec 9, 2024

ps610 commented Dec 4, 2024 •

edited

Loading

andyzhangx commented Dec 4, 2024 •

edited

Loading

andyzhangx commented Dec 4, 2024 •

edited

Loading