Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve and refactor backup and restore #4917

Merged
merged 75 commits into from
Sep 28, 2023
Merged

Conversation

ldming
Copy link
Collaborator

@ldming ldming commented Aug 30, 2023

This PR has redesign the backup/restore and its related CRDs, the relevant controllers have also been refactored. So I'd like to apologize to the reviewers that this is too big of a PR. Below I will explain specifically what has been done and what still needs to be done in the future(Not in this PR).

Refactor backup/restore CRD

  • refactor backup CRD, using backupMethod to replace backupType, add other fileds
  • split backupSchedule out of backupPolicy, using backupPolicy to configure backup policy that can be used for both manual and automatic backups, and using backupSchedule to configure automatic scheduling strategies
  • for BackupPolicy, remove datafile and snapshot fileds, use more flexible backupMethod array to specify the backup methods, the backup type is only one attribute of backupMethod
  • using actionSet to replace backupTool, with actionSet defining the specific actions required for backup and restore, delete fields like logicalCommand and physicalCommand.
  • refactor backupPolicyTemplate
  • add restore CRD by @wangyelei that is used to restore a backup
  • modify all backupPolicyTempldate and backupTool to use new CRD

Refactor controller

  • split some code from controller to internal/dataprotection
  • backup controller build some actions to execute and check the actions status
  • split backupPolicy controller to backupPolicy controller and backupSchedule controller
  • refactor backupPolicyTemplate transformer build backupPolicy and backupSchedule
  • support restore controller that prepare the backed up data to a PV, move the restore logic from cluster controller to this controller, the cluster controller only need to build a restore CR and check its status. by @wangyelei

kbcli

  • kbcli use new CRD

needs to be done in the future PR

  • add more test cases to improve the coverage
  • add Error Code for returned error
  • add Conditions in CR status
  • use the builder to build the kubernetes object
  • kbcli supports to show all backup methods
  • support to backup and restore kubernetes resources
  • more kbcli work
  • and so on

In summary, this PR refactored backup and restore, including refactoring the CRD and controller code, but there is still much work remaining going forward.

local env test

TODO:

  • refactor CRDs, use backupMethods instead of datafile/logfile/snapshot
  • backup controller
  • backup schedule controller
  • backuppolicytemplate
  • kbcli use new CRD
  • test cases
  • cluster controller hscale based on backup
  • cluster controller restore from backup
  • modify all existed data protection CR

@github-actions github-actions bot added size/L Denotes a PR that changes 100-499 lines. size/XL Denotes a PR that changes 500-999 lines. size/XXL Denotes a PR that changes 1000+ lines. and removed size/L Denotes a PR that changes 100-499 lines. size/XL Denotes a PR that changes 500-999 lines. labels Aug 30, 2023
@sonarcloud
Copy link

sonarcloud bot commented Sep 13, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 17 Code Smells

No Coverage information No Coverage information
0.1% 0.1% Duplication

apis/apps/v1alpha1/clusterdefinition_webhook.go Outdated Show resolved Hide resolved
apis/dataprotection/v1alpha1/actionset_types.go Outdated Show resolved Hide resolved
apis/dataprotection/v1alpha1/actionset_types.go Outdated Show resolved Hide resolved
// Backup total size.
// A string with capacity units in the form of "1Gi", "1Mi", "1Ki".
// totalSize is the total size of backed up data size.
// A string with capacity units in the format of "1Gi", "1Mi", "1Ki".
// +optional
TotalSize string `json:"totalSize,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add another field to record the exact size in bytes, for example totalSizeBytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use one filed to record the data size, maybe totalSizeBytes is a good choice. And other client can convert it to more human-readable format.

I plan to improve it in subsequent PRs.

apis/dataprotection/v1alpha1/backuppolicy_types.go Outdated Show resolved Hide resolved
controllers/dataprotection/cronjob_controller.go Outdated Show resolved Hide resolved
deploy/apecloud-mysql-cluster/Chart.yaml Outdated Show resolved Hide resolved
deploy/apecloud-mysql-cluster/extension.yaml Outdated Show resolved Hide resolved
deploy/apecloud-mysql/config/mysql8-config.tpl Outdated Show resolved Hide resolved
internal/cli/cmd/cluster/create.go Outdated Show resolved Hide resolved
@ldming
Copy link
Collaborator Author

ldming commented Sep 28, 2023

/approve to trigger test

@ldming ldming merged commit f7c18e0 into main Sep 28, 2023
94 of 102 checks passed
@ldming ldming deleted the feature/refactor-backup branch September 28, 2023 03:49
Copy link
Contributor

@zjx20 zjx20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many changes and I haven't done my review of the controller logic yet.

SourceCluster string `json:"sourceCluster,omitempty"`
RestoreTime *time.Time `json:"restoreTime,omitempty"`
RestoreTimeStr string `json:"restoreTimeStr,omitempty"`
RestoreManagementPolicy string `json:"volumeRestorePolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RestoreManagementPolicy is hard to understand, maybe ParallelRestoreVolumes bool.

test/integration/controller_suite_test.go Show resolved Hide resolved
internal/testutil/apps/common_util.go Show resolved Hide resolved
internal/testutil/apps/constant.go Show resolved Hide resolved
internal/testutil/apps/constant.go Show resolved Hide resolved
internal/dataprotection/types/types.go Show resolved Hide resolved
internal/dataprotection/restore/builder.go Show resolved Hide resolved
@ldming
Copy link
Collaborator Author

ldming commented Sep 28, 2023

There are too many changes and I haven't done my review of the controller logic yet.

Ok, I will fix review comment in another PR.

Copy link
Contributor

@zjx20 zjx20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished my review.

type ErrorMode string

const (
ErrorModeContinue ErrorMode = "Continue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused.

return intctrlutil.CheckedRequeueWithError(err, reqCtx.Log, "")
}
if len(targetPods) > 1 {
return nil, fmt.Errorf("do not support more than one target pods")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems possible to hit this error when PodSelectionStrategyAll is specified in the backup policy.

internal/dataprotection/backup/scheduler.go Show resolved Hide resolved
internal/dataprotection/restore/utils.go Show resolved Hide resolved
// if reaches full backup, sort the BackupActionSets and return
sortBackupSets := func(backupSets []BackupActionSet, reverse bool) []BackupActionSet {
sort.Slice(backupSets, func(i, j int) bool {
if reverse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse is always false.

var (
restoreActionCount int
finishedActionCount int
exitFailedAction bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existFailedAction? According to the Recalculation().

@@ -372,11 +372,9 @@ type BackupSpec struct {
// +optional
BackupPolicyName string `json:"backupPolicyName"`

// Backup Type. datafile or logfile or snapshot. If not set, datafile is the default type.
// +kubebuilder:default=datafile
// +kubeBuilder:validation:Enum={datafile,logfile,snapshot}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this an enum string type?

metadata:
name: xtrabackup-for-apecloud-mysql
labels:
clusterdefinition.kubeblocks.io/name: apecloud-mysql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using label isn't ideal for hard constraint/bounding with clusterdefinition, what is the reason for this design?

// +kubebuilder:validation:Enum={Full,Incremental,Differential,Continuous}
// +kubebuilder:default=Full
// +kubebuilder:validation:Required
BackupType BackupType `json:"backupType"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have refactored to a generic name (ActionSet), and I would suggest have this attribute as "type" to accommodate future extensible design? Or otherwise this API would be named as "BackupRestoreActionSet".

ldming added a commit that referenced this pull request Nov 21, 2023
Signed-off-by: L.Dongming <l.dongming@apecloud.com>
Co-authored-by: wangyelei <yelei.wang@apecloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Features] refactor and improve backup
6 participants