From 2c143a3e9cf85d4e9afa0f29777c43d4a2738b8f Mon Sep 17 00:00:00 2001 From: Stephen Cahill Date: Thu, 22 Aug 2024 18:16:43 -0400 Subject: [PATCH] update CI team docs + release tracking issue --- .github/ISSUE_TEMPLATE/release_tracking.md | 70 +++++++++---------- .../role-handbooks/ci-signal/README.md | 41 ++++------- .../role-handbooks/release-lead/README.md | 5 +- 3 files changed, 53 insertions(+), 63 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/release_tracking.md b/.github/ISSUE_TEMPLATE/release_tracking.md index 87e0f2f608dd..f2275195a934 100644 --- a/.github/ISSUE_TEMPLATE/release_tracking.md +++ b/.github/ISSUE_TEMPLATE/release_tracking.md @@ -8,7 +8,7 @@ assignees: '' --- -Please see the corresponding section in [release-tasks.md](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md) for documentation of individual tasks. +Please see the corresponding sections of the [role-handbooks](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks) for documentation of individual tasks. ## Tasks @@ -17,34 +17,34 @@ Please see the corresponding section in [release-tasks.md](https://github.com/ku * The following is based on the v1.6 release cycle. Modify according to the tracked release cycle. Week 1: -* [ ] [Release Lead] [Finalize release schedule and team](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#finalize-release-schedule-and-team) -* [ ] [Release Lead] [Add/remove release team members](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#addremove-release-team-members) -* [ ] [Release Lead] [Prepare main branch for development of the new release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#prepare-main-branch-for-development-of-the-new-release) -* [ ] [Communications Manager] [Add docs to collect release notes for users and migration notes for provider implementers](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#add-docs-to-collect-release-notes-for-users-and-migration-notes-for-provider-implementers) -* [ ] [Communications Manager] [Update supported versions](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#update-supported-versions) +* [ ] [Release Lead] [Finalize release schedule and team](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#finalize-release-schedule-and-team) +* [ ] [Release Lead] [Add/remove release team members](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#addremove-release-team-members) +* [ ] [Release Lead] [Prepare main branch for development of the new release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#prepare-main-branch-for-development-of-the-new-release) +* [ ] [Communications Manager] [Add docs to collect release notes for users and migration notes for provider implementers](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#add-docs-to-collect-release-notes-for-users-and-migration-notes-for-provider-implementers) +* [ ] [Communications Manager] [Update supported versions](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#update-supported-versions) Week 1 to 4: -* [ ] [Release Lead] [Track] [Remove previously deprecated code](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#track-remove-previously-deprecated-code) +* [ ] [Release Lead] [Track] [Remove previously deprecated code](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#track-remove-previously-deprecated-code) Week 6: -* [ ] [Release Lead] [Cut the v1.5.1 & v1.4.6 releases](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) +* [ ] [Release Lead] [Cut the v1.5.1 & v1.4.6 releases](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) Week 9: -* [ ] [Release Lead] [Cut the v1.5.2 & v1.4.7 releases](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) +* [ ] [Release Lead] [Cut the v1.5.2 & v1.4.7 releases](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) Week 11 to 12: -* [ ] [Release Lead] [Track] [Bump dependencies](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#track-bump-dependencies) +* [ ] [Release Lead] [Track] [Bump dependencies](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#track-bump-dependencies) Week 13: -* [ ] [Release Lead] [Cut the v1.6.0-beta.0 release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) -* [ ] [Release Lead] [Cut the v1.5.3 & v1.4.8 releases](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) -* [ ] [Release Lead] [Create a new GitHub milestone for the next release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#create-a-new-github-milestone-for-the-next-release) -* [ ] [Communications Manager] [Communicate beta to providers](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#communicate-beta-to-providers) +* [ ] [Release Lead] [Cut the v1.6.0-beta.0 release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) +* [ ] [Release Lead] [Cut the v1.5.3 & v1.4.8 releases](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) +* [ ] [Release Lead] [Create a new GitHub milestone for the next release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#create-a-new-github-milestone-for-the-next-release) +* [ ] [Communications Manager] [Communicate beta to providers](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#communicate-beta-to-providerss) Week 14: -* [ ] [Release Lead] [Cut the v1.6.0-beta.1 release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) -* [ ] [Release Lead] [Set a tentative release date for the next minor release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#set-a-tentative-release-date-for-the-next-minor-release) -* [ ] [Release Lead] [Assemble next release team](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#assemble-next-release-team) +* [ ] [Release Lead] [Cut the v1.6.0-beta.1 release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) +* [ ] [Release Lead] [Set a tentative release date for the next minor release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#set-a-tentative-release-date-for-the-next-minor-release) +* [ ] [Release Lead] [Assemble next release team](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#set-a-tentative-release-date-for-the-next-minor-release) * [ ] [Release Lead] Select release lead for the next release cycle Week 15: @@ -52,36 +52,34 @@ Week 15: * KubeCon idle week Week 16: -* [ ] [Release Lead] [Cut the v1.6.0-rc.0 release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) -* [ ] [Release Lead] [Update milestone applier and GitHub Actions](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#update-milestone-applier-and-github-actions) -* [ ] [CI Manager] [Setup jobs and dashboards for the release-1.6 release branch](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#setup-jobs-and-dashboards-for-a-new-release-branch) -* [ ] [Communications Manager] [Ensure the book for the new release is available](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#ensure-the-book-for-the-new-release-is-available) +* [ ] [Release Lead] [Cut the v1.6.0-rc.0 release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) +* [ ] [Release Lead] [Update milestone applier and GitHub Actions](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#update-milestone-applier-and-github-actions) +* [ ] [CI Manager] [Setup jobs and dashboards for the release-1.6 release branch](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/ci-signal#setup-jobs-and-dashboards-for-a-new-release-branch) +* [ ] [Communications Manager] [Ensure the book for the new release is available](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#ensure-the-book-for-the-new-release-is-available) Week 17: -* [ ] [Release Lead] [Cut the v1.6.0-rc.1 release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) +* [ ] [Release Lead] [Cut the v1.6.0-rc.1 release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) Week 18: -* [ ] [Release Lead] [Cut the v1.6.0 release](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) -* [ ] [Release Lead] [Cut the v1.5.4 & v1.4.9 releases](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#repeatedly-cut-a-release) +* [ ] [Release Lead] [Cut the v1.6.0 release](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) +* [ ] [Release Lead] [Cut the v1.5.4 & v1.4.9 releases](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#repeatedly-cut-a-release) * [ ] [Release Lead] Organize release retrospective -* [ ] [Communications Manager] [Change production branch in Netlify to the new release branch](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#change-production-branch-in-netlify-to-the-new-release-branch) -* [ ] [Communications Manager] [Update clusterctl links in the quickstart](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#update-clusterctl-links-in-the-quickstart) +* [ ] [Communications Manager] [Change production branch in Netlify to the new release branch](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#change-production-branch-in-netlify-to-the-new-release-branch) +* [ ] [Communications Manager] [Update clusterctl links in the quickstart](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#update-clusterctl-links-in-the-quickstart) Continuously: -* [Release lead] [Maintain the GitHub release milestone](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#continuously-maintain-the-github-release-milestone) -* [Release lead] [Bump the Go version](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#continuously-bump-the-go-version) -* [Communications Manager] [Communicate key dates to the community](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#continuously-communicate-key-dates-to-the-community) +* [Release lead] [Maintain the GitHub release milestone](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#continuously-maintain-the-github-release-milestone) +* [Release lead] [Bump the Go version](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#continuously-bump-the-go-version) +* [Communications Manager] [Communicate key dates to the community](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/communications#continuously-communicate-key-dates-to-the-community) * [Communications Manager] Improve release process documentation * [Communications Manager] Maintain and improve user facing documentation about releases, release policy and release calendar -* [CI Manager] [Monitor CI signal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#continuously-monitor-ci-signal) -* [CI Manager] [Reduce the amount of flaky tests](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#continuously-reduce-the-amount-of-flaky-tests) -* [CI Manager] [Bug triage](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#continuously-bug-triage) -* [CI Manager] Maintain and improve release automation, tooling & related developer docs +* [CI Manager] [Monitor CI signal](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/ci-signal#continuously-monitor-ci-signal) +* [CI Manager] [Reduce the amount of flaky tests](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/ci-signal#continuously-reduce-the-amount-of-flaky-tests) If and when necessary: -* [ ] [Release Lead] [Track] [Bump the Cluster API apiVersion](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#optional-track-bump-the-cluster-api-apiversion) -* [ ] [Release Lead] [Track] [Bump the Kubernetes version](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#optional-track-bump-the-kubernetes-version) -* [ ] [Release Lead] [Track Release and Improvement tasks](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/release-tasks.md#optional-track-release-and-improvement-tasks) +* [ ] [Release Lead] [Track] [Bump the Cluster API apiVersion](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#optional-track-bump-the-cluster-api-apiversion) +* [ ] [Release Lead] [Track] [Bump the Kubernetes version](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#optional-track-bump-the-kubernetes-version) +* [ ] [Release Lead] [Track Release and Improvement tasks](https://github.com/kubernetes-sigs/cluster-api/tree/main/docs/release/role-handbooks/release-lead#optional-track-release-and-improvement-tasks) /priority critical-urgent /kind feature diff --git a/docs/release/role-handbooks/ci-signal/README.md b/docs/release/role-handbooks/ci-signal/README.md index 585a55d68635..700d3e6e5103 100644 --- a/docs/release/role-handbooks/ci-signal/README.md +++ b/docs/release/role-handbooks/ci-signal/README.md @@ -1,4 +1,4 @@ -# CI Signal/Bug Triage/Automation Manager +# CI Signal ## Overview @@ -12,7 +12,6 @@ - [Setup jobs and dashboards for a new release branch](#setup-jobs-and-dashboards-for-a-new-release-branch) - [[Continuously] Monitor CI signal](#continuously-monitor-ci-signal) - [[Continuously] Reduce the amount of flaky tests](#continuously-reduce-the-amount-of-flaky-tests) - - [[Continuously] Bug triage](#continuously-bug-triage) @@ -22,38 +21,34 @@ * Responsibility for the quality of the release * Continuously monitor CI signal, so a release can be cut at any time * Add CI signal for new release branches -* Bug Triage: - * Make sure blocking issues and bugs are triaged and dealt with in a timely fashion -* Automation: - * Maintain and improve release automation, tooling & related developer docs ## Tasks ### Setup jobs and dashboards for a new release branch The goal of this task is to have test coverage for the new release branch and results in testgrid. -While we add test coverage for the new release branch we will also drop the tests for old release branches if necessary. + +This task is performed after the new release branch is cut [by the release workflow](https://github.com/kubernetes-sigs/cluster-api/blob/defa62d5340f4b49f1acab80cc8cc10727b85291/.github/workflows/release.yaml#L61-L63) during the final weeks of the release cycle. + +While we add test coverage for the new release branch we will also drop the tests for old release branches if necessary. Examples to follow assume the new release branch is `release-1.8` 1. Create new jobs based on the jobs running against our `main` branch: - 1. Copy the `main` branch entry as `release-1.6` in the `cluster-api-prowjob-gen.yaml` file in [test-infra](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/cluster-api/). - 2. Modify the following at the `release-1.6` branch entry: - * Change intervals (let's use the same as for `release-1.5`). + 1. Copy the `main` branch entry as `release-1.8` in the `cluster-api-prowjob-gen.yaml` file in [test-infra](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/cluster-api/). + 2. Modify the following at the `release-1.8` branch entry: + * Change intervals (let's use the same as for `release-1.7`). 2. Create a new dashboard for the new branch in: `test-infra/config/testgrids/kubernetes/sig-cluster-lifecycle/config.yaml` (`dashboard_groups` and `dashboards`). -3. Remove old release branches and unused versions from the `cluster-api-prowjob-gen.yaml` file in [test-infra](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/cluster-api/) according to our policy documented in [Support and guarantees](../../../../CONTRIBUTING.md#support-and-guarantees). For example, let's assume we just added `release-1.6`, then we can now drop test coverage for the `release-1.3` branch. +3. Remove old release branches and unused versions from the `cluster-api-prowjob-gen.yaml` file in [test-infra](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/cluster-api/) according to our policy documented in [Support and guarantees](../../../../CONTRIBUTING.md#support-and-guarantees). As we just added `release-1.8`, then we can now drop test coverage for the `release-1.5` branch. 4. Regenerate the prowjob configuration running `make generate-test-infra-prowjobs` command from cluster-api repository. Before running this command, ensure to export the `TEST_INFRA_DIR` variable, specifying the location of the [test-infra](https://github.com/kubernetes/test-infra/) repository in your environment. For further information, refer to this [link](https://github.com/kubernetes-sigs/cluster-api/pull/9937). ```sh TEST_INFRA_DIR=../../k8s.io/test-infra make generate-test-infra-prowjobs ``` -5. Verify the jobs and dashboards a day later by taking a look at: `https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.6` -6. Update `.github/workflows/weekly-security-scan.yaml` - to setup Trivy and govulncheck scanning - `.github/workflows/weekly-md-link-check.yaml` - to setup link checking in the CAPI book - and `.github/workflows/weekly-test-release.yaml` - to verify the release target is working - for the currently supported branches. -7. Update the [PR markdown link checker](https://github.com/kubernetes-sigs/cluster-api/blob/main/.github/workflows/pr-md-link-check.yaml) accordingly (e.g. `main` -> `release-1.6`). -
Prior art: [Update branch for link checker](https://github.com/kubernetes-sigs/cluster-api/pull/9206) - +5. Verify the jobs and dashboards a day later by taking a look at: `https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.8` +6. Update the [PR markdown link checker](https://github.com/kubernetes-sigs/cluster-api/blob/main/.github/workflows/pr-md-link-check.yaml) accordingly (e.g. `main` -> `release-1.8`). Prior art: -* [Add jobs for CAPI release 1.6](https://github.com/kubernetes/test-infra/pull/31208) +* [Add jobs for CAPI release 1.8](https://github.com/kubernetes/test-infra/pull/33156) ### [Continuously] Monitor CI signal @@ -76,6 +71,8 @@ The goal of this task is to keep our tests running in CI stable. Eventually open issues as described above. 7. Run periodic deep-dive sessions with the CI team to investigate failing and flaking tests. Example session recording: https://www.youtube.com/watch?v=YApWftmiDTg + **Note**: Maintaining the health of the project is a community effort. CI team should use all of the tools available to them to attempt to keep the CI signal clean, however the [#cluster-api](https://kubernetes.slack.com/archives/C8TSNPY4T) Slack channel should be used to increase visibility of release blocking interruptions to the CI signal and seek help from community. This should be *additive* to the steps described above. When in doubt, err on the side of overcommunication to promote awareness and drive disruptions to resolution. + ### [Continuously] Reduce the amount of flaky tests The Cluster API tests are pretty stable, but there are still some flaky tests from time to time. @@ -85,13 +82,5 @@ To reduce the amount of flakes please periodically: 1. Take a look at recent CI failures via `k8s-triage`: * [main: e2e, e2e-mink8s, test, test-mink8s](https://storage.googleapis.com/k8s-triage/index.html?job=.*cluster-api.*(test%7Ce2e)-(mink8s-)*main&xjob=.*-provider-.*) 2. Open issues using an appropriate template (flaking-test) for occurring flakes and ideally fix them or find someone who can. - **Note**: Given resource limitations in the Prow cluster it might not be possible to fix all flakes. - Let's just try to pragmatically keep the amount of flakes pretty low. - -### [Continuously] Bug triage - -The goal of bug triage is to triage incoming issues and if necessary flag them with `release-blocking` -and add them to the milestone of the current release. -We probably have to figure out some details about the overlap between the bug triage task here, release leads -and Cluster API maintainers. \ No newline at end of file + **Note**: Given resource limitations in the Prow cluster it might not be possible to fix all flakes. Let's just try to pragmatically keep the amount of flakes pretty low. diff --git a/docs/release/role-handbooks/release-lead/README.md b/docs/release/role-handbooks/release-lead/README.md index 40bb865b877e..e445309f6d46 100644 --- a/docs/release/role-handbooks/release-lead/README.md +++ b/docs/release/role-handbooks/release-lead/README.md @@ -133,6 +133,8 @@ We should take a look at the following dependencies: There is currently no formalized process to assemble the release team. As of now we ask for volunteers in Slack and office hours. +Overweighing the CI team with members is preferred as maintaining a clean CI signal is crucial to the health of the project. + ### Update milestone applier and GitHub Actions Once release branch is created by GitHub Automation, the goal of this task would be to ensure we have the milestone @@ -264,4 +266,5 @@ Additional information: * At the beginning of the cycle, Release Team Lead should prepare the improvement tasks board for the ongoing release cycle. The following steps can be taken: - Edit improvement tasks board name for current cycle (e.g. `CAPI vX.Y release improvement tasks`) - - Add/move all individual missing issues to the board \ No newline at end of file + - Add/move all individual missing issues to the board + * Tasks that improve release automation, tooling & related developer docs are ideal candidates and should be prioritized. \ No newline at end of file