Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document migration steps to CCM #42

Open
andrewsykim opened this issue Jan 22, 2020 · 12 comments
Open

Document migration steps to CCM #42

andrewsykim opened this issue Jan 22, 2020 · 12 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. P1 Priority 1
Milestone

Comments

@andrewsykim
Copy link
Member

We should document how a user would manually migrate their clusters from using in-tree cloud providers to out-of-tree cloud provider. The documented steps can be manual or via a tool like kubeadm.

@onitake
Copy link

onitake commented Jan 22, 2020

To get started, a rough outline:

  1. Ensure a CCM for your cloud environment is available, with roughly the same feature set than the integrated KCM provider. Determine compatibility issues (missing features, different implementation, etc).
  2. Prepare your cloud environment and workloads for the migration: Disable unsupported features, build a list of manual actions to be done after migration (deleting unused cloud resources, renaming, etc).
  3. Prepare the CCM for deployment: Write configuration files, deploy credentials, etc. Do not deploy the CCM yet
  4. Disable the integrated provider in KCM and kubelet: Remove flags, replace with --provider external, etc. Restart these services.
  5. Deploy the cloud provider.
  6. Ensure it synchronises with the running environment correctly, detects existing resources, deploys new resources where appropriate. Apply manual fixes where necessary.
  7. Test deploy new cloud resources such as LoadBalancers and Nodes.

There will be certain differences between different cloud providers, as compatibility between integrated and external cannot always be guaranteed.

@andrewsykim andrewsykim added this to the v1.18 milestone Jan 22, 2020
@andrewsykim andrewsykim added the P1 Priority 1 label Jan 22, 2020
@andrewsykim andrewsykim self-assigned this Jan 22, 2020
@andrewsykim
Copy link
Member Author

I think this warrants a page in the official Kubernetes docs, @onitake are you willing to put something together?

@onitake
Copy link

onitake commented Jan 22, 2020

Yes, I think I can do that.
But I will need more input, and possibly some insight on the situation with different providers.

  • Which features are likely to have compatibility issues? Load balancers, node labeling, launch parameters, credential injection come to mind. Others?
  • How to run the different cloud providers? Should there be an example deployment for each?
  • Do we need to account for environments where KCM and/or the CCM is/was running directly on a host as opposed to the k8s control plane?
  • Are there dependencies on the cloud provider that need to be reconfigured? There is a cloudprovider.PVLabeler interface - how is this used? Are there some cloud providers that are also storage provisioners?
  • Should monitoring topics be addressed?

@onitake
Copy link

onitake commented Jan 22, 2020

And also, where should the documentation live?

  • on /docs/concepts/cluster-administration/cloud-providers.md ?
  • in a new page under /docs/concepts/cluster-administration/cloud-providers/ ?
  • in a new page under /docs/tasks/administer-cluster/ ?
  • in a new tutorial page under /docs/tutorials/clusters/ ?

@andrewsykim
Copy link
Member Author

andrewsykim commented Jan 23, 2020

How to run the different cloud providers? Should there be an example deployment for each?

I think we should stick to documenting one, AWS is probably the best example because of # of users that manage it themselves. The steps should mostly be the same across all providers as well

Do we need to account for environments where KCM and/or the CCM is/was running directly on a host as opposed to the k8s control plane?

I think we can assume control plane nodes are separate nodes

Are there dependencies on the cloud provider that need to be reconfigured? There is a cloudprovider.PVLabeler interface - how is this used? Are there some cloud providers that are also storage provisioners?

I think for the first pass, we should ignore storage providers and add CSI migration documentation iteratively.

Should monitoring topics be addressed?

No, I think just showing how to validate your CCM is working is fine

where should the documentation live

I think something like docs/tasks/administer-cluster/migrating-to-cloud-controller-manager is good.

@onitake
Copy link

onitake commented Jan 24, 2020

I launched a PR, please submit corrections and input on how to migrate on AWS. I'm slightly biased towards private cloud CCM migrations, so please public cloud users: Give input on your cloud environment specifics.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2020
@cheftako
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2020
@cheftako
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 15, 2020
@cheftako
Copy link
Member

/cc @jiahuif

@cheftako
Copy link
Member

/assign @jiahuif

@cheftako
Copy link
Member

/cc @jpbetz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. P1 Priority 1
Projects
None yet
Development

No branches or pull requests

6 participants