Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node-agent high memory usage #8582

Open
RobKenis opened this issue Jan 6, 2025 · 23 comments
Open

Node-agent high memory usage #8582

RobKenis opened this issue Jan 6, 2025 · 23 comments

Comments

@RobKenis
Copy link

RobKenis commented Jan 6, 2025

What steps did you take and what happened:

Velero is installed with nodeAgent enabled. Backup storage location is configured to use Azure Blob Storage.

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-that-uses-a-lot-of-memory
  namespace: velero
spec:
  backupName: external-backup-1
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  - csinodes.storage.k8s.io
  - volumeattachments.storage.k8s.io
  - backuprepositories.velero.io
  - policies.rabbitmq.com
  existingResourcePolicy: update
  hooks: {}
  includedNamespaces:
  - custom-namespace
  itemOperationTimeout: 48h0m0s
  uploaderConfig:
    parallelFilesDownload: 16

The memory request for node-agent is set to 5Gi. Normally, the limit is also set to 5Gi. To avoid the Pod getting OOMKilled, I have removed the limit for this restore.

What did you expect to happen:

Memory usage stays around, but below, 5Gi.

Anything else you would like to add:

image

Environment:

  • Velero version (use velero version): v1.15.0
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version): v1.28.2+k3s1
  • Kubernetes installer & version: v1.28.2+k3s1
  • Cloud provider or hardware configuration: Azure VM
  • OS (e.g. from /etc/os-release): AlmaLinux 9.5 (Teal Serval)

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Jan 7, 2025

The recommended configuration is to set as "no limit", or Best Efforts.
Depends on the complexity and scale of the data being backed up, node-agent may take much memory during the backup/restore (fs-backup), by the fs-uploader and repository. That memory comes down to process heap memory and system paging cache.
After the backup/restore, the system paging cache may not be reclaimed if your node has enough memory. This is a system behavior and out of control of Velero.
After the backup/restore, most of the process heap memory will be released, but a small proportion will be retained to be reused for the following backups/restores, which is a way of performance enhancement.

@Lyndon-Li
Copy link
Contributor

Another recommendation is to use data mover backup/restore over fs-backup:

  1. Data mover backup/restore allocates the memory into a dedicate pod and will be fully released after the backup/restore (since 1.15)
  2. Data mover backup/restore is more consistent than fs-backup

@RobKenis
Copy link
Author

RobKenis commented Jan 7, 2025

How would removing the memory limit impact the system as a whole. Could this mean that the node goes OOM when Velero node agent uses too much memory?

The reason we would like to set memory limits is because we are running on a resource constrained system and we would like to avoid impact on other services.
During restore, other services are scaled down, so this is less of an issue, but during backup it would be nice to set a reasonable limit so the node agent still gets OOMKilled if it goes over the limit instead of impacting other services.

@Lyndon-Li
Copy link
Contributor

How would removing the memory limit impact the system as a whole. Could this mean that the node goes OOM when Velero node agent uses too much memory?

That depends on the complexity and scale of the data being backed up. Most probably, the node memory will not run out, but the system cache will be reclaimed when the memory is tight. However, this would also impact the running of other workload in the same node.

@Lyndon-Li
Copy link
Contributor

The reason we would like to set memory limits is because we are running on a resource constrained system and we would like to avoid impact on other services

If so, data mover is also recommended, because you could customize which nodes the data mover should/should not run, but you cannot do this for fs-backup.

@RobKenis
Copy link
Author

RobKenis commented Jan 7, 2025

That depends on the complexity and scale of the data being backed up. Most probably, the node memory will not run out, but the system cache will be reclaimed when the memory is tight. However, this would also impact the running of other workload in the same node.

The volume that causes the most issues seems to be a volume that contains around 1TB of small files, between 500K and 5M in size.

@msfrucht
Copy link
Contributor

msfrucht commented Jan 8, 2025

That isn't surprising. Worst case scenario of deduplication-based backup and restore software such as restic and kopia used in Velero is a large number of small files.

Red Hat's recommendation for a normal sized config is 16GB request, 32GB limit for restic. Your usage requirements are in line with expectations.

Kopia usually uses less resources in Velero 1.15 compared to restic. Will require new backups as kopia and restic repositories are not compatible and have no migration path. You can check which is use by checking the BackupRepository object.

@RobKenis
Copy link
Author

Out of curiosity, I have been playing around with GOMEMLIMIT and GOGC to see if I can lower the overall memory usage. I have been able to lower the peak usage from 25G to 15G. After a restore is done, the go runtime keeps 8G, which is higher than expected.
I understand that this memory usage is to be expected for my use case of a lot of small files, but the systems we deploy on don't really allow this because it would impact other services running on the systems.
Do you have recommendations to lower the peak memory usage, even if they impact velero performance?

@msfrucht
Copy link
Contributor

If the repository is restic then setting the parallelFilesUpload might improve the memory usage at the cost of performance.

Kopia is set to do parallel uploads equal to the number of cpus. Lowering parallel data upload streams would lower the memory usage at the cost of performance. I don't know if setting a cpu limit would change that reporting.

@Lyndon-Li I don't suppose you've tested that. A brief check using nproc got the same result to report the number of cpu cores regardless of cpu request and limit. That isn't necessarily the same as go.

@Lyndon-Li
Copy link
Contributor

I don't know if setting a cpu limit would change that reporting

No, the number CPU got from Golang is always the number of CPU cores in the node, CPU limit of cgroup doesn't affect the number. So always use the backup parameter --parallel-files-upload to change the number of uploads

@Lyndon-Li
Copy link
Contributor

@RobKenis
Could you also share the number of CPU cores in your nodes?

@RobKenis
Copy link
Author

RobKenis commented Jan 13, 2025

@Lyndon-Li I am testing on a system 16 cores and 128GB of memory

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   16
  On-line CPU(s) list:    0-15
Vendor ID:                AuthenticAMD
  Model name:             AMD EPYC 7763 64-Core Processor
    CPU family:           25
    Model:                1
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             1
    BogoMIPS:             4890.85
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nons
                          top_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse
                           3dnowprefetch osvw topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru arat npt nrip_save
                           tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm
               total        used        free      shared  buff/cache   available
Mem:           125Gi        51Gi       4.8Gi       2.0Gi        72Gi        74Gi
Swap:             0B          0B          0B

@msfrucht
Copy link
Contributor

The option --parallel-files-upload doesn't affect Kopia, only Restic. https://github.com/vmware-tanzu/velero/blob/release-1.15/pkg/uploader/kopia/snapshot.go#L100

Should that be a separate issue to have this option apply to Kopia?

@Lyndon-Li
Copy link
Contributor

The option --parallel-files-upload doesn't affect Kopia, only Restic

No, to the opposite, it works for Kopia path only,

curPolicy.UploadPolicy.MaxParallelFileReads = newOptionalInt(parallelUpload)

@Lyndon-Li
Copy link
Contributor

@RobKenis

I am testing on a system 16 cores and 128GB of memory

Then the default concurrency in your env is 16.
And also notice that he pattern of your current data is the typical pattern that consumes high memory during backup, restore and repo maintenance.

So here are the recommendation all in all:

  1. Data mover backup/restore should be used over fs-backup
  2. You could reduce the memory usage of backup/restore by controlling the concurrency (--parallel-files-upload), but that will significantly low down the performance; on the other hand, you cannot control the memory usage during repo maintenance. So you could try this but it is not from our recommendation.
  3. We recommend you prepare one or more dedicated nodes to run data mover backup/restore and repo maintenance, data mover pods and repo maintenance pods run and only run in those nodes and are free to use memory. And you could also block other workloads to run in those nodes, so that they won't be affected by lack of memory.

@RobKenis
Copy link
Author

@Lyndon-Li I understand the need for Data mover, this would resolve a big part of the problem. From what I understand, this requires a CSI driver to create Volume Snapshots. Is this also a possible solution when using Local Volumes as we don't use a CSI Driver?

@msfrucht
Copy link
Contributor

@Lyndon-Li Thanks for the correction and useful to know. @RobKenis As it means you can set this value below the cpu count of your system and it should reduce the memory usage if using Kopia at the cost of performance.

RobKenis pushed a commit to RobKenis/velero that referenced this issue Jan 15, 2025
This allows us to enable the profiler endpoints on both the
server and the node agent.
This helps me in troubleshooting the high memory usage when
restoring lots of small files.

Refs: vmware-tanzu#8582
RobKenis pushed a commit to RobKenis/velero that referenced this issue Jan 15, 2025
This allows us to enable the profiler endpoints on both the
server and the node agent.
This helps me in troubleshooting the high memory usage when
restoring lots of small files.

Refs: vmware-tanzu#8582

Signed-off-by: Rob Kenis <rob.kenis@hotmail.com>
@RobKenis
Copy link
Author

@Lyndon-Li @msfrucht I lower the amount of parallel files using the following config in the Restore resource.

uploaderConfig:
    parallelFilesDownload: 1

This makes the restore a lot slower, but memory still rises to a high amount.

I would like to get more insights into this, but it seems I cannot enable profiling endpoints on the node agent, only the velero server. Could you please have a look at this PR #8618 to enable profiling on the node agent?

@Lyndon-Li
Copy link
Contributor

See this comment #8582 (comment). System cache takes lots of memory during fs-uploader read/write files. The cache memory won't be aggressively reclaimed as long as there are enough memory in the node, even after the backup/restore completes (since you are using fs-backup and node-agent).

See these recommendations #8582 (comment) for the solution.

RobKenis pushed a commit to RobKenis/velero that referenced this issue Jan 20, 2025
This allows us to enable the profiler endpoints on both the
server and the node agent.
This helps me in troubleshooting the high memory usage when
restoring lots of small files.

Refs: vmware-tanzu#8582

Signed-off-by: Rob Kenis <rob.kenis@hotmail.com>
@RobKenis
Copy link
Author

I have built a custom velero version that inverts the queuing on Kopia as proposed in this PR
Image

Image

Is there a way in which I can implement this change in https://github.com/project-velero/kopia? The downside is that the totalbytes will not be calculated in the beginning of the restoring, which messes up the ETA of a PodVolumeRestore. The benefit is that the memory consumption is halved in this scenario.

What is the purpose of https://github.com/project-velero/kopia? It looks like a normal fork, but I do not understand why the upstream https://github.com/kopia/kopia project is not used.

@Lyndon-Li
Copy link
Contributor

You can ignore https://github.com/project-velero/kopia, any PR accepted by Kopia upstream will go to the fork timely and then go with Velero.

And for the Kopia PR you mentioned, a substitution PR has already merged, Velero may enable the new mechanism after sufficient test as the new mechanism makes side effects more or less.

However, as mentioned above, your data pattern is typical for large memory consuming. Many aspects of the uploader and repository take memory, not only the enumeration part as the PR you mentioned. We are continuously optimizing the memory usage, some have been done, some are still in progress.

@RobKenis
Copy link
Author

Hi @Lyndon-Li

Do you have a link to the substitution PR you are referring to? I am willing to validate the performance and resource impact of this change.

I understand that my data layout introduces more memory usage, but I am still on the look for solutions that lower the memory consumption since I am working on a system with limited resources.

@Lyndon-Li
Copy link
Contributor

Do you have a link to the substitution PR you are referring to?

FYI, kopia/kopia#4218.
This new mechanism of estimation is not enabled from Velero, so you cannot verify it from the official code/release for now.
However, you can do it in your own branch by modifying some flags when calling Kopia uploader.

I am still on the look for solutions that lower the memory consumption since I am working on a system with limited resources

What is your expectation of memory usage with your current data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants