Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resizing of LVM after Host-Reboot not working #647

Closed
schmidax opened this issue Apr 17, 2024 · 11 comments · Fixed by #728
Closed

Resizing of LVM after Host-Reboot not working #647

schmidax opened this issue Apr 17, 2024 · 11 comments · Fixed by #728

Comments

@schmidax
Copy link

Hi,

I am using Linstor with Piraeus-Operator (2.5.0) on my Kubernetes cluster. The OS is Ubuntu 22.04.4 with lvm installed and Kubernetes is the distro from Rancher, v1.27.12+rke2r1

My problem is nearly the same problem as described in this feature request LINBIT/linstor-server#326 .

As long as I have all nodes running I can create and resize disks as often as I want. But as soon as I restart the nodes I have exactly the same problem as described above. But what I see is the following difference.

09:41:28 root@k8s-w3:~> ll /dev/mapper/
total 0
drwxr-xr-x  2 root root      480 Apr  5 12:14 ./
drwxr-xr-x 24 root root     5.0K Apr  8 07:49 ../
crw-------  1 root root  10, 236 Mar 21 15:42 control
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--0926060d--020e--460d--9ad4--a38b903cc22f_00000 -> ../dm-20
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--11ec7e16--8679--477a--9d09--ec1f8ddfeec2_00000 -> ../dm-15
brw-rw----  1 root disk 253,   3 Apr  5 11:42 datavg-pvc--3335b921--6fc8--4b30--add0--83d4b504e40b_00000 
brw-rw----  1 root disk 253,  10 Apr  5 11:42 datavg-pvc--4289386d--54b6--4a03--935e--a3d730e624a5_00000
brw-rw----  1 root disk 253,  22 Apr  5 11:42 datavg-pvc--44ade64f--674c--4245--ae72--c014e4f57f64_00000
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--4cd57ee3--8c7b--4c70--8588--326d4cce8329_00000 -> ../dm-18
lrwxrwxrwx  1 root root        7 Mar 21 15:43 datavg-pvc--6bf88a47--687e--4795--b9fc--b72709cc83d0_00000 -> ../dm-9
lrwxrwxrwx  1 root root        7 Mar 21 15:43 datavg-pvc--7483fc1f--22c1--450d--b9d5--46ddb8a9e81b_00000 -> ../dm-7
brw-rw----  1 root disk 253,  23 Apr  5 11:44 datavg-pvc--83e08ce3--5a82--4954--8d0b--e2652ed67917_00000
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--96cbc6fc--a7d8--44cf--aaaa--2db6eb4aca08_00000 -> ../dm-19
brw-rw----  1 root disk 253,  24 Apr  5 11:46 datavg-pvc--d095ac0d--24cc--4062--906b--58996fae538b_00000
brw-rw----  1 root disk 253,  25 Apr  5 11:46 datavg-pvc--d15a4935--fa5e--4dcf--b667--ac2029f0ed41_00000
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--dc26b6d6--ae6d--4df2--88ad--7f2030cfde68_00000 -> ../dm-13
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--e773cae1--b88c--454f--992a--3a4eefd92639_00000 -> ../dm-17
lrwxrwxrwx  1 root root        8 Mar 21 15:43 datavg-pvc--e953d039--b9eb--448e--8a4d--fbdd3e0ba3ce_00000 -> ../dm-14
brw-rw----  1 root disk 253,  26 Apr  5 11:47 datavg-pvc--e9e2cd7b--3abe--4a03--8904--b5508a1f9c67_00000
brw-rw----  1 root disk 253,  21 Apr  5 11:23 datavg-pvc--f19fc677--e4bf--4e23--bf69--920b26745d1f_00000
09:41:35 root@k8s-w3:~> ll /dev/datavg/
total 0
drwxr-xr-x  2 root root  380 Apr  5 13:27 ./
drwxr-xr-x 24 root root 5.0K Apr  8 07:49 ../
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-0926060d-020e-460d-9ad4-a38b903cc22f_00000 -> ../dm-20
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-11ec7e16-8679-477a-9d09-ec1f8ddfeec2_00000 -> ../dm-15
lrwxrwxrwx  1 root root   70 Apr  5 11:42 pvc-3335b921-6fc8-4b30-add0-83d4b504e40b_00000 -> /dev/mapper/datavg-pvc--3335b921--6fc8--4b30--add0--83d4b504e40b_00000
lrwxrwxrwx  1 root root   70 Apr  5 11:42 pvc-4289386d-54b6-4a03-935e-a3d730e624a5_00000 -> /dev/mapper/datavg-pvc--4289386d--54b6--4a03--935e--a3d730e624a5_00000
lrwxrwxrwx  1 root root   70 Apr  5 11:42 pvc-44ade64f-674c-4245-ae72-c014e4f57f64_00000 -> /dev/mapper/datavg-pvc--44ade64f--674c--4245--ae72--c014e4f57f64_00000
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-4cd57ee3-8c7b-4c70-8588-326d4cce8329_00000 -> ../dm-18
lrwxrwxrwx  1 root root    7 Mar 21 15:43 pvc-6bf88a47-687e-4795-b9fc-b72709cc83d0_00000 -> ../dm-9
lrwxrwxrwx  1 root root    7 Mar 21 15:43 pvc-7483fc1f-22c1-450d-b9d5-46ddb8a9e81b_00000 -> ../dm-7
lrwxrwxrwx  1 root root   70 Apr  5 11:44 pvc-83e08ce3-5a82-4954-8d0b-e2652ed67917_00000 -> /dev/mapper/datavg-pvc--83e08ce3--5a82--4954--8d0b--e2652ed67917_00000
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-96cbc6fc-a7d8-44cf-aaaa-2db6eb4aca08_00000 -> ../dm-19
lrwxrwxrwx  1 root root   70 Apr  5 11:46 pvc-d095ac0d-24cc-4062-906b-58996fae538b_00000 -> /dev/mapper/datavg-pvc--d095ac0d--24cc--4062--906b--58996fae538b_00000
lrwxrwxrwx  1 root root   70 Apr  5 11:46 pvc-d15a4935-fa5e-4dcf-b667-ac2029f0ed41_00000 -> /dev/mapper/datavg-pvc--d15a4935--fa5e--4dcf--b667--ac2029f0ed41_00000
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-dc26b6d6-ae6d-4df2-88ad-7f2030cfde68_00000 -> ../dm-13
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-e773cae1-b88c-454f-992a-3a4eefd92639_00000 -> ../dm-17
lrwxrwxrwx  1 root root    8 Mar 21 15:43 pvc-e953d039-b9eb-448e-8a4d-fbdd3e0ba3ce_00000 -> ../dm-14
lrwxrwxrwx  1 root root   70 Apr  5 11:47 pvc-e9e2cd7b-3abe-4a03-8904-b5508a1f9c67_00000 -> /dev/mapper/datavg-pvc--e9e2cd7b--3abe--4a03--8904--b5508a1f9c67_00000
lrwxrwxrwx  1 root root   70 Apr  5 13:27 pvc-f19fc677-e4bf-4e23-bf69-920b26745d1f_00000 -> /dev/mapper/datavg-pvc--f19fc677--e4bf--4e23--bf69--920b26745d1f_00000

After some research I see that after reboot udev see the lvms and create the symlinks to the dm devices. And there is the Problem. After this action the lvresize in the container can resize this volumes but lost the symlinks.

Could it be that there is a general problem, or have I done something fundamentally wrong?
Even with manual tests with other distros (Suse Enterpris Linux Micro 5.4 --> generated with Rancher Elementel) I encounter the same problem.

If more details are needed from me, please let me know.

@WanzenBug
Copy link
Member

Hi! Thanks for reporting this issue.

We are currently investigating quite similar issues. Our best guess is that there is a race between the container lvmtools (where we completely disable udev, so lvcreate will manually create the symlinks) and udev running on the host.

However, we have not been able to trace the exact reason why a resize or taking a snapshot can cause udevd to remove the symlink from the list.

@WanzenBug
Copy link
Member

Could you give the following configuration a try?

---
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: udev
spec:
  podTemplate:
    spec:
      containers:
      - name: linstor-satellite
        volumeMounts:
        - name: lvmconfig
          mountPath: /etc/lvm/lvm.conf
          subPath: lvm.conf
          readOnly: true
      volumes:
      - name: lvmconfig
        configMap:
          name: lvmconfig
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: lvmconfig
data:
  lvm.conf: |
    activation {
      udev_sync=1
      monitoring=0
      udev_rules=1
    }
    devices {
      global_filter="r|^/dev/drbd|"
      obtain_device_list_from_udev=1
    }

We already pass through the udev socket so enabling the lvmtools to wait for udev should not be an issue. That way there should be no race between container lvm tools and udevd

@schmidax
Copy link
Author

This configuration is working!

But in the errlog I get following:

LINSTOR ==> err l -s "1days"
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id                    ┊ Datetime            ┊ Node                                   ┊ Exception                                     ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 661F8514-702D1-000000 ┊ 2024-04-17 08:18:36 ┊ S|k8s-w3 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-09613-000000 ┊ 2024-04-17 08:18:36 ┊ S|k8s-w1 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-0193A-000000 ┊ 2024-04-17 08:18:36 ┊ S|k8s-w2 ┊ StorageException: Failed to resize lvm volume ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

LINSTOR ==> err s 661F8514-0193A-000000
ERROR REPORT 661F8514-0193A-000000

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            1.27.0
Build ID:                           8250eddde5f533facba39b4d1f77f1ef85f8521d
Build time:                         2024-04-02T07:12:21+00:00
Error time:                         2024-04-17 08:18:36
Node:                               k8s-w2
Thread:                             DeviceManager

============================================================

Reported error:
===============

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69

Error message:                      Failed to resize lvm volume

Error context:
        An error occurred while processing resource 'Node: 'k8s-w2', Rsc: 'pvc-4b8fc030-9725-4d70-8394-dccd6460478b''
ErrorContext:
  Details:     Command 'lvresize --config 'devices { filter=['"'"'a|/dev/sda5|'"'"','"'"'r|.*|'"'"'] }' --size 18878464k datavg/pvc-4b8fc030-9725-4d70-8394-dccd6460478b_00000 -f' returned with exitcode 5. 

Standard out: 


Error message: 
  New size (4609 extents) matches existing size (4609 extents).




Call backtrace:

    Method                                   Native Class:Line number
    checkExitCode                            N      com.linbit.extproc.ExtCmdUtils:69
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:103
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:63
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:51
    resize                                   N      com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:230
    lambda$resizeLvImpl$2                    N      com.linbit.linstor.layer.storage.lvm.LvmProvider:448
    execWithRetry                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:505
    resizeLvImpl                             N      com.linbit.linstor.layer.storage.lvm.LvmProvider:445
    resizeLvImpl                             N      com.linbit.linstor.layer.storage.lvm.LvmProvider:67
    resizeVolumes                            N      com.linbit.linstor.layer.storage.AbsStorageProvider:717
    processVolumes                           N      com.linbit.linstor.layer.storage.AbsStorageProvider:361
    processResource                          N      com.linbit.linstor.layer.storage.StorageLayer:282
    lambda$processResource$4                 N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:908
    processGeneric                           N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:949
    processResource                          N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:904
    processChild                             N      com.linbit.linstor.layer.drbd.DrbdLayer:323
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:447
    processResource                          N      com.linbit.linstor.layer.drbd.DrbdLayer:250
    lambda$processResource$4                 N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:908
    processGeneric                           N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:949
    processResource                          N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:904
    processResources                         N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:370
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:217
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:331
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1204
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:778
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:672
    run                                      N      java.lang.Thread:840


END OF ERROR REPORT.

@WanzenBug
Copy link
Member

Could that be remains from a previous attempt? I.e. can you try a new resize now? Do these errors still happen?

@schmidax
Copy link
Author

No this was a new attempt. And a new resize cause the same Error

LINSTOR ==> err l -s 1
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id                    ┊ Datetime            ┊ Node                                   ┊ Exception                                     ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 661F8514-702D1-000000 ┊ 2024-04-17 08:18:36 ┊ S|k8s-w3 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-09613-000000 ┊ 2024-04-17 08:18:36 ┊ S|k8s-w1 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-0193A-000000 ┊ 2024-04-17 08:18:36 ┊ S|k8s-w2 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-702D1-000001 ┊ 2024-04-17 08:50:10 ┊ S|k8s-w3 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-0193A-000001 ┊ 2024-04-17 08:50:10 ┊ S|k8s-w2 ┊ StorageException: Failed to resize lvm volume ┊
┊ 661F8514-09613-000001 ┊ 2024-04-17 08:50:10 ┊ S|k8s-w1 ┊ StorageException: Failed to resize lvm volume ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

LINSTOR ==> err s 661F8514-09613-000001
ERROR REPORT 661F8514-09613-000001

============================================================

Application:                        LINBIT? LINSTOR
Module:                             Satellite
Version:                            1.27.0
Build ID:                           8250eddde5f533facba39b4d1f77f1ef85f8521d
Build time:                         2024-04-02T07:12:21+00:00
Error time:                         2024-04-17 08:50:10
Node:                               k8s-w1
Thread:                             DeviceManager

============================================================

Reported error:
===============

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'checkExitCode', Source file 'ExtCmdUtils.java', Line #69

Error message:                      Failed to resize lvm volume

Error context:
        An error occurred while processing resource 'Node: 'k8s-w1', Rsc: 'pvc-f3936c57-5c13-4ed4-96ba-97be510bdcc2''
ErrorContext:
  Details:     Command 'lvresize --config 'devices { filter=['"'"'a|/dev/sda5|'"'"','"'"'r|.*|'"'"'] }' --size 20979712k datavg/pvc-f3936c57-5c13-4ed4-96ba-97be510bdcc2_00000 -f' returned with exitcode 5. 

Standard out: 


Error message: 
  New size (5122 extents) matches existing size (5122 extents).




Call backtrace:

    Method                                   Native Class:Line number
    checkExitCode                            N      com.linbit.extproc.ExtCmdUtils:69
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:103
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:63
    genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:51
    resize                                   N      com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:230
    lambda$resizeLvImpl$2                    N      com.linbit.linstor.layer.storage.lvm.LvmProvider:448
    execWithRetry                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:505
    resizeLvImpl                             N      com.linbit.linstor.layer.storage.lvm.LvmProvider:445
    resizeLvImpl                             N      com.linbit.linstor.layer.storage.lvm.LvmProvider:67
    resizeVolumes                            N      com.linbit.linstor.layer.storage.AbsStorageProvider:717
    processVolumes                           N      com.linbit.linstor.layer.storage.AbsStorageProvider:361
    processResource                          N      com.linbit.linstor.layer.storage.StorageLayer:282
    lambda$processResource$4                 N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:908
    processGeneric                           N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:949
    processResource                          N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:904
    processChild                             N      com.linbit.linstor.layer.drbd.DrbdLayer:323
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:447
    processResource                          N      com.linbit.linstor.layer.drbd.DrbdLayer:250
    lambda$processResource$4                 N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:908
    processGeneric                           N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:949
    processResource                          N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:904
    processResources                         N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:370
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:217
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:331
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1204
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:778
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:672
    run                                      N      java.lang.Thread:840


END OF ERROR REPORT.

@WanzenBug
Copy link
Member

Have you rebooted the nodes in the meantime? I guess an sos-report would be good.

@schmidax
Copy link
Author

Now I rebooted the nodes and the same error
Here is the sos-report: sos_2024-04-17_10-26-54.tar.gz

@WanzenBug
Copy link
Member

WanzenBug commented Apr 17, 2024

Ok, so one small fix is also settings hostIPC: true:

spec:
  podTemplate:
    spec:
      hostIPC: true
      containers:
      ...

Haven't been able to find the source of your specific issues, but this fixes lvm commands hanging on lvcreate, etc..

@schmidax
Copy link
Author

This small fix help!

@maxpain
Copy link

maxpain commented Nov 3, 2024

I have the same problem.
Why is this still not fixed in Piraeus-operator?

@WanzenBug
Copy link
Member

Why is this still not fixed in Piraeus-operator?

Because making LVM work consistently when running in a container is hard. Even harder if the solution should support Ubuntu, Debian, RHEL, Talos Linux, and whatever else might be running out there.

@WanzenBug WanzenBug linked a pull request Nov 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants