Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore fails due to insufficient ephemeral node storage #7499

Closed
Kamforka opened this issue Mar 6, 2024 · 14 comments
Closed

Restore fails due to insufficient ephemeral node storage #7499

Kamforka opened this issue Mar 6, 2024 · 14 comments

Comments

@Kamforka
Copy link

Kamforka commented Mar 6, 2024

What steps did you take and what happened:

velero restore create --from-backup my-backup --restore-volumes=true --wait

then

velero restore describe my-backup-20240305145411 --details

Started: 2024-03-05 14:54:11 +0100 CET
Completed: 2024-03-05 16:31:27 +0100 CET
[...]
Operation for persistentvolumeclaims my-namespace/elasticsearch-data-pvc:
Restore Item Action Plugin: velero.io/csi-pvc-restorer
Operation ID: dd-43fdc92e-1ff2-4fba-9944-c322a5799647.cdcd4260-b15a-414fec1a0
Phase: Failed
Operation Error: DataDownload is canceled
Progress: 39348675984 of 45149812450 complete (Bytes)
Progress description: Canceled
Created: 2024-03-05 14:54:12 +0100 CET
Started: 2024-03-05 14:54:12 +0100 CET
Updated: 2024-03-05 16:31:19 +0100 CET

What did you expect to happen:

I expected to make a backup of a namespace (which was successful), delete the namespace and then restore it.

When I make the restore it finishes after some time as completed, but the restored namespace has a PV which will never be created because its data download was cancelled for an unknown reason (as you can see from the detailed describe).

This will corrupt the recreation/bounding of the restored PVC and thus the whole namespace.

The following information will help us better understand what's going on:

Unfortunately I cannot provide the logs due to confidentiality reasons.

Anything else you would like to add:

Environment:

  • Velero version (use velero version): v1.13.0
  • Velero features (use velero client config get features): features:
  • Kubernetes version (use kubectl version): client/v1.29.1; server/v1.24.14+vmware.1
  • Kubernetes installer & version: n/a
  • Cloud provider or hardware configuration: n/a
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li
Copy link
Contributor

The DataDownload has not actually completed at the time of cancallation:
Progress: 39348675984 of 45149812450 complete (Bytes)

Most probably, the node-agent was OOM killed, so you can try to increase the memory limit of node-agent and retry.

If DataDownload is not completed, the restore should not be marked as Completed. Could you double confirm this?

@Kamforka
Copy link
Author

Kamforka commented Mar 7, 2024

Hi @Lyndon-Li! So technically the status of the restore is PartiallyFailed however when I do a simple describe I see the below in the warning/error section:

Warnings:
  Velero:     <none>
  Cluster:  could not restore, CustomResourceDefinition "sealedsecrets.bitnami.com" already exists. Warning: the in-cluster version is different than the backed-up version
  Namespaces:
    my-namespace:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces: <none>

Nothing about the PV failure, which is quite misleading.

My bad, actually there's a line at the end of the describe like:

Restore Item Operations:  2 of 3 completed successfully, 1 failed (specify --details for more information)

However I still think this should be also mentioned in the Errors: section, shouldn't it?

But you are probably right about the node-agent, how can I further debug this to make sure that's the root cause? Mind you this restoration process worked for the very first time, but never since, always failing with the same PV, while restoring every other PV.

@Kamforka
Copy link
Author

Kamforka commented Mar 7, 2024

Okay I've checked the node agents, and indeed I could observer OOM killed statuses. I've increased the memory limit to a much higher value and rerunning the restore, will let you know how it worked out.

@Kamforka
Copy link
Author

Kamforka commented Mar 7, 2024

After increasing the resource limits the node-agent was not killed due to OOM but the restore failed again.

I checked the logs and I could only see these lines:

time="2024-03-07T14:10:13Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:10:13Z" level=warning msg="failed to add finalizer for velero/my-backup-20240307151012-cl925 and will requeue later" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:134"
time="2024-03-07T14:10:13Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:10:13Z" level=info msg="Data download starting" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:151"
time="2024-03-07T14:10:13Z" level=info msg="Accepting data download my-backup-20240307151012-cl925" controller=DataDownload logSource="pkg/controller/data_download_controller.go:659"
time="2024-03-07T14:10:13Z" level=info msg="This datadownload has been accepted by others" DataDownload=my-backup-20240307151012-cl925 controller=DataDownload logSource="pkg/controller/data_download_controller.go:689"
time="2024-03-07T14:10:13Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:10:30Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:10:31Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:10:31Z" level=info msg="Data download is prepared" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:227"
time="2024-03-07T14:20:33Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:20:33Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:20:45Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:20:45Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:20:59Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:20:59Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:21:09Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:21:09Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:21:38Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:21:38Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:21:50Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:21:50Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:22:00Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:22:00Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:22:34Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:22:34Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:22:52Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:22:52Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:23:06Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:23:06Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:23:39Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:23:39Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:24:33Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:24:33Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:24:50Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:24:50Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:25:03Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:25:03Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:25:48Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:25:48Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:26:51Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:26:51Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:27:10Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:27:10Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:36:02Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"
time="2024-03-07T14:36:02Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:286"
time="2024-03-07T14:36:02Z" level=info msg="Data download is being canceled" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:288"
time="2024-03-07T14:36:02Z" level=warning msg="Async fs backup data path canceled" controller=DataDownload datadownload=my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:399"
time="2024-03-07T14:36:02Z" level=warning msg="failed to delete pvc velero/my-backup-20240307151012-cl925 with err persistentvolumeclaims \"my-backup-20240307151012-cl925\" not found" logSource="pkg/util/kube/pvc_pv.go:73"
time="2024-03-07T14:36:02Z" level=info msg="Reconcile my-backup-20240307151012-cl925" controller=datadownload datadownload=velero/my-backup-20240307151012-cl925 logSource="pkg/controller/data_download_controller.go:102"

I cannot figure why it is being cancelled. I'll increase resources to do a last attempt, but according to metrics the usage is within limits, so can you please suggest what can go wrong here?

@Kamforka
Copy link
Author

Kamforka commented Mar 7, 2024

By digging deeper in the related DataDownload resource:

$ k describe datadownloads.velero.io my-backup-20240307151012-cl925
Name:         my-backup-20240307151012-cl925
Namespace:    velero
Labels:       velero.io/accepted-by=d6c37003-a9f8-45db-ba7e-e965bd8136d8
              velero.io/async-operation-id=dd-36e7ca5e-3ff9-48f8-ac4c-42483370ec30.cdcd4260-b15a-4148daa0e
              velero.io/restore-name=my-backup-20240307151012
              velero.io/restore-uid=36e7ca5e-3ff9-48f8-ac4c-42483370ec30
Annotations:  <none>
API Version:  velero.io/v2alpha1
Kind:         DataDownload
Metadata:
  Creation Timestamp:  2024-03-07T14:10:13Z
  Generate Name:       my-backup-20240307151012-
  Generation:          22
  Owner References:
    API Version:     velero.io/v1
    Controller:      true
    Kind:            Restore
    Name:            my-backup-20240307151012
    UID:             36e7ca5e-3ff9-48f8-ac4c-42483370ec30
  Resource Version:  37706165
  UID:               dacd13c6-e0f2-4c53-bdb8-addbc1b28592
Spec:
  Backup Storage Location:  default
  Cancel:                   true
  Operation Timeout:        10m0s
  Snapshot ID:              6f9d4868fec1571c92b2f9c54a1e4562
  Source Namespace:         my-namespace
  Target Volume:
    Namespace:  my-namespace
    Pv:         
    Pvc:        elasticsearch-data-pvc
Status:
  Completion Timestamp:  2024-03-07T14:36:02Z
  Message:               found a datadownload with status "InProgress" during the node-agent starting, mark it as cancel
  Node:                  2cefba45-3cae-46c4-bbde-a737212e14d3
  Phase:                 Canceled
  Progress:
    Bytes Done:     42874466070
    Total Bytes:    45149812450
  Start Timestamp:  2024-03-07T14:10:13Z
Events:             <none>

The line Message: found a datadownload with status "InProgress" during the node-agent starting, mark it as cancel is a bit concerning.

Thoughts?

@sseago
Copy link
Collaborator

sseago commented Mar 7, 2024

"found a datadownload with status "InProgress" during the node-agent starting, mark it as cancel"
This means the node agent pod got restarted before the datadownload was complete. Since in-progress uploads/downloads can't be resumed when the pod crashes, they're marked as canceled. This cancel should result in the Restore being PartiallyFailed, though, so if it's truly being marked as Completed, we may have a bug there.

@Kamforka
Copy link
Author

Kamforka commented Mar 7, 2024

Okay I checked the node-agents again and seems like they get evicted one by one and then recreated.

So I checked the status of the evicted ones:

Status:           Failed
Reason:           Evicted
Message:          Pod The node had condition: [DiskPressure].

Made me to think about the ephemeral storage size of my nodes which is normally 50G, and the PV I'm trying to restore is close to 50G.

I'm going to double the ephemeral storage and retry the restore.

The weird thing is that the very first time the restore worked without an issue, however it keeps failing consistently ever since for the very same PV, so there might be some cleanup issues associated with failed restores?

@Kamforka
Copy link
Author

Kamforka commented Mar 8, 2024

Alright so the root cause was that the node of the node-agent was running out of temporary storage, which is weird as I wouldn't expect 50 GB to be insufficient for the operation of the restore.

Is it required to have at least as much temporary node storage as the size of the backup to be restored?

It's just a guess but I think the restore process is leaking storage somehow. Can you please check?

@Kamforka Kamforka changed the title Restore marked as complete however a PV restore was cancelled Restore fails due to insufficient ephemeral node storage Mar 8, 2024
@reasonerjt
Copy link
Contributor

reasonerjt commented Mar 11, 2024

Is it required to have at least as much temporary node storage as the size of the backup to be restored?

The answer is No

It's just a guess but I think the restore process is leaking storage somehow. Can you please check?

It is known that kopia will persist some data locally as cache, so the disk usage can vary based on the number of namespaces the node agent handles..
@Lyndon-Li please follow up.

@Lyndon-Li
Copy link
Contributor

Below are several information about disk cache:

  1. During restore, which reads data from the repo, kopia repo client allocate a cache from the node storage, the cache has a roof fixed size -- 2G for data and 2G for metadata, therefore, you can think that in maximum, it takes 4G disk space for each repo.
  2. And the cached data is not released even after restore completes since it is still useful for the next restore.
  3. As another fact, Velero create one repo per namespace. Therefore, the max disk space required in the node storage is Num of namespace * 4G.
  4. Velero at present doesn't have mechanism to release the cache data after the backup repo/BSL is deleted

@Kamforka As described above, the disk space required has nothing to do with the volume size to be restored, or the total size of the node storage. It is only related to the free space of the node storage, if the free space is not enough, the node storage would be run out and node would be evicted.

To further troubleshoot, please check which of the 4 points you case has fallen into, specifically:

  • How much free space in your node storage?
  • How many BSLs are you using?
  • How many namespaces are you backing up?
  • Have you ever deleted any BSL? Because of 4, if you have, some stale cache data may be left,. You can inspect the used space in your node storage and find any stale data and delete them

@Kamforka
Copy link
Author

Hi everyone,

Thanks for the information. Sorry for not following up on them immediately but I was doing a couple of test restores today and I still think something is not quite right about the ephemeral storage consumption of the node agents.

I was running a single restore on a single namespace.

I was monitoring the node-agent which was restoring my main PV.

This is the historical ephemeral storage usage of the pod node-agent-pvx47:

Every 1.0s: kubectl get --raw "/api/v1/nodes/6bb767d7-05b0-4467-b2c7-dd61e2d6af47/proxy/stats/summary" | jq '.pods[] | select(.podRef.name == "node-agent-pvx47")."ephemeral-storage"'
 
{
  "time": "2024-03-11T10:33:44Z",
  "availableBytes": 95836561408,
  "capacityBytes": 105490616320,
  "usedBytes": 19115122688,
  "inodesFree": 6522280,
  "inodes": 6553600,
  "inodesUsed": 13894
}
 
Every 1.0s: kubectl get --raw "/api/v1/nodes/6bb767d7-05b0-4467-b2c7-dd61e2d6af47/proxy/stats/summary" | jq '.pods[] | select(.podRef.name == "node-agent-pvx47")."ephemeral-storage"'
{
  "time": "2024-03-11T10:37:33Z",
  "availableBytes": 95836184576,
  "capacityBytes": 105490616320,
  "usedBytes": 28821803008,
  "inodesFree": 6522280,
  "inodes": 6553600,
  "inodesUsed": 17565
}
 
 
Every 1.0s: kubectl get --raw "/api/v1/nodes/6bb767d7-05b0-4467-b2c7-dd61e2d6af47/proxy/stats/summary" | jq '.pods[] | select(.podRef.name == "node-agent-pvx47")."ephemeral-storage"'
 
{
  "time": "2024-03-11T10:38:29Z",
  "availableBytes": 95836176384,
  "capacityBytes": 105490616320,
  "usedBytes": 38970085376,
  "inodesFree": 6522278,
  "inodes": 6553600,
  "inodesUsed": 20513
}
 
Every 1.0s: kubectl get --raw "/api/v1/nodes/6bb767d7-05b0-4467-b2c7-dd61e2d6af47/proxy/stats/summary" | jq '.pods[] | select(.podRef.name == "node-agent-pvx47")."ephemeral-storage"'
 
{
  "time": "2024-03-11T10:39:54Z",
  "availableBytes": 95836143616,
  "capacityBytes": 105490616320,
  "usedBytes": 50268405760,
  "inodesFree": 6522277,
  "inodes": 6553600,
  "inodesUsed": 18073
}

As you can see minute by minute the consumption is sky rocketing: 19G -> 28.8G -> 38.9G -> 50G

This time it was not triggering a warning but previously I could monitor the below events in the corresponding nodes:

Events:
  Type     Reason                 Age                 From     Message
  ----     ------                 ----                ----     -------
  Warning  EvictionThresholdMet   42m                 kubelet  Attempting to reclaim ephemeral-storage
  Normal   NodeHasDiskPressure    42m                 kubelet  Node 6bb767d7-05b0-4467-b2c7-dd61e2d6af47 status is now: NodeHasDiskPressure
  Normal   NodeHasNoDiskPressure  37m (x3 over 3d2h)  kubelet  Node 6bb767d7-05b0-4467-b2c7-dd61e2d6af47 status is now: NodeHasNoDiskPressure

This doesn't seem right to me, any thoughts?

@Lyndon-Li
Copy link
Contributor

There is an issue in Velero that the repo's data cache limit is not correctly set, as a result, the repo's cache control behavior is the cached data is swept when its age is older than 10min. As a result, the cache data may accumulate to a high value without any control in this 10min.

We will fix this problem.

@Lyndon-Li Lyndon-Li added this to the v1.14 milestone Mar 14, 2024
@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Mar 25, 2024

Here is a conclusion after a further discussion:

  • The disk cache affects the performance of the restore heavily, so in most case, users don't want to set a too tight behavior for the cache reservation, i.e., set a very small cache or clean the cache with a small span of time.
  • If we set a hard limit to clean the cache immediately after it reaches to a specific size, we need to expose it to users so that most users who don't have the disk limit could gain a good performance of restore
  • On the other hand, in order to expose this parameter, we need to change the spec of backupRepository CR
  • Therefore, we will document this behavior -- cache clean is triggered when it reaches 2GB but data won't be deleted until they are older than 10min. But we will not expose the hard limit for now, until we collect enough requirements requiring this

Let us know you feedbacks. cc @Kamforka

@Lyndon-Li
Copy link
Contributor

I opened another issue #7620 to track the requirements to make the Kopia repo cache policies configurable.
I will open the current issue for 1.14 milestone, since in 1.14 we only want to document the cache behavior and we've done that in #7604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants