-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for node-agent concurrency #6950
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Add the design for node-agent concurrency |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# Node-agent Concurrency Design | ||
|
||
## Glossary & Abbreviation | ||
|
||
**Velero Generic Data Path (VGDP)**: VGDP is the collective of modules that is introduced in [Unified Repository design][1]. Velero uses these modules to finish data transfer for various purposes (i.e., PodVolume backup/restore, Volume Snapshot Data Movement). VGDP modules include uploaders and the backup repository. | ||
|
||
## Background | ||
|
||
Velero node-agent is a daemonset hosting controllers and VGDP modules to complete the concrete work of backups/restores, i.e., PodVolume backup/restore, Volume Snapshot Data Movement backup/restore. | ||
For example, node-agent runs DataUpload controllers to watch DataUpload CRs for Volume Snapshot Data Movement backups, so there is one controller instance in each node. One controller instance takes a DataUpload CR and then launches a VGDP instance, which initializes a uploader instance and the backup repository connection, to finish the data transfer. The VGDP instance runs inside the node-agent pod or in a pod associated to the node-agent pod in the same node. | ||
|
||
Varying from the data size, data complexity, resource availability, VGDP may take a long time and remarkable resources (CPU, memory, network bandwidth, etc.). | ||
Technically, VGDP instances are able to run in concurrent regardless of the requesters. For example, a VGDP instance for a PodVolume backup could run in parallel with another VGDP instance for a DataUpload. Then the two VGDP instances share the same resources if they are running in the same node. | ||
|
||
Therefore, in order to gain the optimized performance with the limited resources, it is worthy to configure the concurrent number of VGDP per node. When the resources are sufficient in nodes, users can set a large concurrent number, so as to reduce the backup/restore time; otherwise, the concurrency should be reduced, otherwise, the backup/restore may encounter problems, i.e., time lagging, hang or OOM kill. | ||
|
||
## Goals | ||
|
||
- Define the behaviors of concurrent VGDP instances in node-agent | ||
- Create a mechanism for users to specify the concurrent number of VGDP per node | ||
|
||
## Non-Goals | ||
- VGDP instances from different nodes always run in concurrent since in most common cases the resources are isolated. For special cases that some resources are shared across nodes, there is no support at present | ||
- In practice, restores run in prioritized scenarios, e.g., disaster recovery. However, the current design doesn't consider this difference, a VGDP instance for a restore is blocked if it reaches to the limit of the concurrency, even though the ones block it are for backups. If users do meet some problems here, they should consider to stop the backups first | ||
- Sometimes, users wants to totally block backups/restores from running in a specific node, this is out of the scope the current design. To archive this, more modules need to be considered (i.e., expoers of data movers), simply blocking the VGDP (e.g., by setting its concurrent number to 0) doesn't work. E.g., for a fs backup, VGDP instance must run in the node the source pod is running in, if we simply block from VGDP instance, the PodVolumeBackup CR is still submitted but never processed. | ||
|
||
## Solution | ||
|
||
We introduce a configMap named ```node-agent-configs``` for users to specify the node-agent related configurations. This configMap is not created by Velero, users should create it manually on demand. The configMap should be in the same namespace where Velero is installed. If multiple Velero instances are installed in different namespaces, there should be one configMap in each namespace which applies to node-agent in that namespace only. | ||
Node-agent server checks these configurations at startup time and use it to initiate the related VGDP modules. Therefore, users could edit this configMap any time, but in order to make the changes effective, node-agent server needs to be restarted. | ||
The ```node-agent-configs``` configMap may be used for other purpose of configuring node-agent in future, at present, there is only one kind of configuration as the data in the configMap, the name is ```data-path-concurrency```. | ||
|
||
The data structure for ```data-path-concurrency``` is as below: | ||
```go | ||
type DataPathConcurrency struct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry that, this in conjunction with the changes that you are describing for the data upload change, may be confusing. I think that, at least now, i understand the DataPath name and what it means, but I don't think it is clear to the end users what this is managing. Can we call this external concurrency or node load concurrency or something, it feels like DataPath is very much an internal concept that does convey specific meaning. Thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, DataPath is more like a design/architecture level concept. Personally, I think "loadConcurrency" (we don't need to mention "node" because it is under the node-agent-configs) is close to what we are doing here and is not ambiguous. Waiting for others' opinions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this naming question, we decide not to block the merge of this design. We can keep the discussion on this, and when we want to change the name, if the design has been merged, we come back to make slight change to this design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Lyndon-Li "loadConcurrency" works for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. loadConcurrency works for me as well. Docs on this guy will be necessary, but I now better understand what this is doing. Thank you for taking the time, and I agree if inspiration strikes and folks come up with a better name, awesome! |
||
// GlobalConfig specifies the concurrency number to all nodes for which per-node config is not specified | ||
GlobalConfig int `json:"globalConfig,omitempty"` | ||
|
||
// PerNodeConfig specifies the concurrency number to nodes matched by rules | ||
PerNodeConfig []RuledConfigs `json:"perNodeConfig,omitempty"` | ||
} | ||
|
||
type RuledConfigs struct { | ||
// NodeSelector specifies the label selector to match nodes | ||
NodeSelector metav1.LabelSelector `json:"nodeSelector"` | ||
|
||
// Number specifies the number value associated to the matched nodes | ||
Number int `json:"number"` | ||
} | ||
``` | ||
|
||
### Global concurrent number | ||
We allow users to specify a concurrent number that will be applied to all nodes if the per-node number is not specified. This number is set through ```globalConfig``` field in ```data-path-concurrency```. | ||
The number starts from 1 which means there is no concurrency, only one instance of VGDP is allowed. There is no roof limit. | ||
If this number is not specified or not valid, a hard-coded default value will be used, the value is set to 1. | ||
|
||
### Per-node concurrent number | ||
We allow users to specify different concurrent number per node, for example, users can set 3 concurrent instances in Node-1, 2 instances in Node-2 and 1 instance in Node-3. This is for below considerations: | ||
- The resources may be different among nodes. Then users could specify smaller concurrent number for nodes with less resources while larger number for the ones with more resources | ||
- Help users to isolate critical environments. Users may run some critical workloads in some specified nodes, since VGDP instances may take large resource consumption, users may want to run less number of instances in the nodes with critical workloads | ||
|
||
The range of Per-node concurrent number is the same with Global concurrent number. | ||
Per-node concurrent number is preferable to Global concurrent number, so it will overwrite the Global concurrent number for that node. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The term "Global concurrent number" is a little confusing to me. IMO this concurrent number is to control the concurrency within one node-agent server, and if this is true, the number is per-node. And the "Global current number" is just a shortcut to help user to set the currency for every node? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the "Global concurrent number" applies to every node by default and if a per-node number is also set for a node, the per-node number overwrites the number in that node. |
||
|
||
Per-node concurrent number is implemented through ```perNodeConfig``` field in ```data-path-concurrency```. | ||
|
||
```perNodeConfig``` is a list of ```RuledConfigs``` each item of which matches one or more nodes by label selectors and specify the concurrent number for the matched nodes. This means, the nodes are identified by labels. | ||
|
||
For example, the ```perNodeConfig`` could have below elements: | ||
``` | ||
"nodeSelector: kubernetes.io/hostname=node1; number: 3" | ||
"nodeSelector: beta.kubernetes.io/instance-type=Standard_B4ms; number: 5" | ||
``` | ||
The first element means the node with host name ```node1``` gets the Per-node concurrent number of 3. | ||
The second element means all the nodes with label ```beta.kubernetes.io/instance-type``` of value ```Standard_B4ms``` get the Per-node concurrent number of 5. | ||
At least one node is expected to have a label with the specified ```RuledConfigs``` element (rule). If no node is with this label, the Per-node rule makes no effect. | ||
If one node falls into more than one rules, e.g., if node1 also has the label ```beta.kubernetes.io/instance-type=Standard_B4ms```, the smallest number (3) will be used. | ||
|
||
### Sample | ||
A sample of the ```node-agent-configs``` configMap is as below: | ||
```json | ||
{ | ||
"globalConfig": 2, | ||
"perNodeConfig": [ | ||
{ | ||
"nodeSelector": { | ||
"matchLabels": { | ||
"kubernetes.io/hostname": "node1" | ||
} | ||
}, | ||
"number": 3 | ||
}, | ||
{ | ||
"nodeSelector": { | ||
"matchLabels": { | ||
"beta.kubernetes.io/instance-type": "Standard_B4ms" | ||
} | ||
}, | ||
"number": 5 | ||
} | ||
] | ||
} | ||
``` | ||
To create the configMap, users need to save something like the above sample to a json file and then run below command: | ||
``` | ||
kubectl create cm node-agent-configs -n velero --from-file=data-path-concurrency=<json file name> | ||
``` | ||
|
||
### Global data path manager | ||
As for the code implementation, data path manager is to maintain the total number of the running VGDP instances and ensure the limit is not excceeded. At present, there is one data path manager instance per controller, as a result, the concurrent numbers are calculated separately for each controller. This doesn't help to limit the concurrency among different requesters. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you help me understand. This is saying that the node agent controller will be responsible for the number of concurrent DataUploads/DataDownloads it can handle and ensure this is true? Can you help me understand why this is not the thread of the workers for the controller and needs something in a global manager? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, this is the current/existing behavior. We need to change the control to the node-agent server wide, as you mentioned in a global manager. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand. Can you please help me? Is this because we have two things (DataUploads and DataDownloads) with workers, and you want to split them? If this is the case, then isn't there a concern that a node agent will take ownership, because its worker is free just spinning because there are no available "dataManager" threads? Then other node agents that are in the system, that may have free "dataManager" threads, will be sitting idle? Did I miss us handling this in the design? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might have mislead you. The thing is, in the existing code, there are one dataPathMgr in one controller, so in order to count the overall VGDP instances no matter they are from a DataUpload, DataDownload, PodVolumeBackup or PodVolumeRestore, we need to make a singleton dataPathMgr. See the existing code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I do think that, because a node takes ownwership of something and then can determine if it can do work on it, we may want to consider how to make sure this is something that can more adequately be spread out. |
||
Therefore, we need to create one global data path manager instance server-wide, and pass it to different controllers. The instance will be created at node-agent server startup. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. by server-wide, do you mean node-agent server or the whole velero setup? I assume it's node-agent server, if that's the node-agent server, I don't think it's global There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is node-agent server. Maybe we can give it a better name, what about "Ordinary Number"? |
||
The concurrent number is required to initiate a data path manager, the number comes from either Per-node concurrent number or Global concurrent number. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we control the limit of the running VGDP instances? Is this already implemented by Velero or Kopia? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it has been implemented, check the dataPath manager code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to also update the PVB, PVR, DataUpload, and DataDownload status to reflect the reason why they are still not running? fsBackup, err := r.dataPathMgr.CreateFileSystemBR(pvb.Name, pVBRRequestor, ctx, r.Client, pvb.Namespace, callbacks, log)
if err != nil {
if err == datapath.ConcurrentLimitExceed {
return ctrl.Result{Requeue: true, RequeueAfter: time.Minute}, nil
} else {
return r.errorOut(ctx, &pvb, err, "error to create data path", log)
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, will do this in the code change PR. |
||
Below are some prototypes related to data path manager: | ||
|
||
```go | ||
func NewManager(cocurrentNum int) *Manager | ||
func (m *Manager) CreateFileSystemBR(jobName string, requestorType string, ctx context.Context, client client.Client, namespace string, callbacks Callbacks, log logrus.FieldLogger) (AsyncBR, error) | ||
``` | ||
|
||
|
||
|
||
|
||
|
||
[1]: unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support configuring the concurrency in the backup/restore level in the future, will this feature be impacted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not impacted. The two controls could work together, backups/restores are not probably all in the same node; while in the same node, the VGDP concurrency mechanism has higher priority over backup/restore level concurrency, as the former is for resources consideration, if there are not enough resources, the backups/restores level configurations are not be fulfilled.