Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new TSG for troubleshooting AKS node auto-repair errors #1629

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

julia-yin
Copy link
Contributor

PR Description

Adding a new TSG for troubleshooting common AKS node auto-repair errors. This TSG will live under the AKS "Node/node pool availability and performance" category.

Pull request guidance

Thank you for submitting your contribution to our support content! Our team works closely with subject matter experts in CSS and PMs in the product group to review all content requests to ensure technical accuracy and the best customer experience. This process can sometimes take one or more days, so we greatly appreciate your patience.

We also need your help in order to process your request as soon as possible:

  • We won't act on your pull request (PR) until you type "#sign-off" in a new comment in your pull request (PR) to indicate that your changes are complete.

  • After you sign off in your PR, the article will be tech reviewed by the PM or SME if it has more than minor changes. Once the article is approved, it will undergo a final editing pass before being merged.

Creating a new TSG to help customers diagnose issues with node auto-repair. Node auto-repair is a feature which will automatically reboot, reimage, and redeploy nodes when they are NotReady for more than 5 minutes. Some of these repair actions may result in errors, and customers will need guidance on determining next steps and fixing any issues.
Copy link

@julia-yin : Thanks for your contribution! The author(s) have been notified to review your proposed change.

Copy link
Contributor

Learn Build status updates of commit a6a6b91:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ⚠️Warning Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md

  • Line 12, Column 181: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 39, Column 134: [Warning: file-not-found - See documentation] Invalid file link: 'LINK'.
  • Line 40, Column 276: [Warning: file-not-found - See documentation] Invalid file link: 'LINK'.
  • Line 12, Column 181: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' will be broken in isolated environments. Replace with a relative link.

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

| ARM ErrorCode: VMExtensionProvisioningError | | |
| ARM ErrorCode: InvalidParameter | | |
| scaleSetNameAndInstanceIDFromProviderID failed | | |
| ManagedIdentityCredential authentication failed | | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, as long as we have some common errors , next steps. This is good to go.


| Error code | Potential causes | Next steps |
|---|---|---|
| ClientSecretCredential authentication failed | | |
Copy link

@shanalily shanalily Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned on teams, I think we can remove this one and add some other top errors. Since this is an issue with misclassifying some MSI clusters which there is a fix rolling out for.

| ClientSecretCredential authentication failed | | |
| ARM ErrorCode: VMExtensionProvisioningError | | |
| ARM ErrorCode: InvalidParameter | | |
| scaleSetNameAndInstanceIDFromProviderID failed | | |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be uninitialized nodes

|---|---|---|
| ClientSecretCredential authentication failed | | |
| ARM ErrorCode: VMExtensionProvisioningError | | |
| ARM ErrorCode: InvalidParameter | | |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mostly an issue with spot VM node objects still existing for a little while after the VMs are preempted.

| Error code | Potential causes | Next steps |
|---|---|---|
| ClientSecretCredential authentication failed | | |
| ARM ErrorCode: VMExtensionProvisioningError | | |
Copy link

@shanalily shanalily Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Learn Build status updates of commit 2dbb198:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ⚠️Warning Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md

  • Line 12, Column 181: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 21, Column 257: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 28, Column 174: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 29, Column 165: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 32, Column 190: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 33, Column 408: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 34, Column 307: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 38, Column 97: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 38, Column 334: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/events?tabs=azure-cli#automating-event-notifications' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 39, Column 276: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 12, Column 181: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' will be broken in isolated environments. Replace with a relative link.
  • Line 21, Column 257: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 174: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' will be broken in isolated environments. Replace with a relative link.
  • Line 29, Column 165: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' will be broken in isolated environments. Replace with a relative link.
  • Line 32, Column 190: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' will be broken in isolated environments. Replace with a relative link.
  • Line 33, Column 408: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 34, Column 307: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 97: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 334: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/events?tabs=azure-cli#automating-event-notifications' will be broken in isolated environments. Replace with a relative link.
  • Line 39, Column 276: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit 884f448:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ⚠️Warning Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md

  • Line 12, Column 181: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 21, Column 257: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 28, Column 174: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 28, Column 453: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 29, Column 165: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 32, Column 190: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 33, Column 408: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 34, Column 307: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 38, Column 97: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 38, Column 334: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/events?tabs=azure-cli#automating-event-notifications' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 39, Column 276: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 12, Column 181: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' will be broken in isolated environments. Replace with a relative link.
  • Line 21, Column 257: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 174: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 453: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors' will be broken in isolated environments. Replace with a relative link.
  • Line 29, Column 165: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' will be broken in isolated environments. Replace with a relative link.
  • Line 32, Column 190: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' will be broken in isolated environments. Replace with a relative link.
  • Line 33, Column 408: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 34, Column 307: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 97: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 334: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/events?tabs=azure-cli#automating-event-notifications' will be broken in isolated environments. Replace with a relative link.
  • Line 39, Column 276: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit 90cce7b:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ⚠️Warning Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md

  • Line 12, Column 298: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 21, Column 239: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 28, Column 174: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 28, Column 453: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 29, Column 165: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 32, Column 190: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 33, Column 408: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 34, Column 307: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 38, Column 97: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 38, Column 334: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/azure/aks/events?tabs=azure-cli#automating-event-notifications' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 39, Column 276: [Warning: hard-coded-locale - See documentation] Link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' contains locale code 'en-us'. For localizability, remove 'en-us' from links to most Microsoft sites.
  • Line 12, Column 298: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/node-auto-repair' will be broken in isolated environments. Replace with a relative link.
  • Line 21, Column 239: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 174: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 453: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors' will be broken in isolated environments. Replace with a relative link.
  • Line 29, Column 165: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' will be broken in isolated environments. Replace with a relative link.
  • Line 32, Column 190: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' will be broken in isolated environments. Replace with a relative link.
  • Line 33, Column 408: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 34, Column 307: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 97: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 334: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/azure/aks/events?tabs=azure-cli#automating-event-notifications' will be broken in isolated environments. Replace with a relative link.
  • Line 39, Column 276: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

| Error code | Causes & Solution |
|---|---|
| ARM ErrorCode: VMExtensionProvisioningError | One or more VM extensions failed to be provisioned on the VM. Read more on possible error types and troubleshooting steps at [Troubleshoot the ERR_VHD_FILE_NOT_FOUND error code (124)](https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound). This may also occur as a result of custom script extension (CSE) errors on the node. Learn more at [Troubleshoot node not ready failures caused by CSE errors](https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors). |
| ARM ErrorCode: InvalidParameter | This occurs due to a parameter causes errors when new nodes are created for an AKS cluster. Read more on causes and solution at [Troubleshoot the InvalidParameter error code](https://learn.microsoft.com/en-us/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause). |
Copy link

@shanalily shanalily Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be confusing since customers did not make the request.

Generally, the aks auto repair service requested a VM that no longer exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Is there anything else we can say here about the customer impact - is this typically due to quota or capacity issues? And will remediator request a different VM or further redeployments will just fail?

Copy link
Contributor

Learn Build status updates of commit 1d1fc78:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md 💡Suggestion Details
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md

  • Line 12, Column 298: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/azure/aks/node-auto-repair' will be broken in isolated environments. Replace with a relative link.
  • Line 21, Column 239: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 174: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-vhdfilenotfound' will be broken in isolated environments. Replace with a relative link.
  • Line 28, Column 447: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-custom-script-extension-errors' will be broken in isolated environments. Replace with a relative link.
  • Line 29, Column 165: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/create-upgrade-delete/error-code-invalidparameter#cause' will be broken in isolated environments. Replace with a relative link.
  • Line 32, Column 190: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state#scenario-3-node-pool-is-in-a-failed-state' will be broken in isolated environments. Replace with a relative link.
  • Line 33, Column 408: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.
  • Line 34, Column 307: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled#solution' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 97: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-container-insights' will be broken in isolated environments. Replace with a relative link.
  • Line 38, Column 328: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/azure/aks/events?tabs=azure-cli#automating-event-notifications' will be broken in isolated environments. Replace with a relative link.
  • Line 39, Column 276: [Suggestion: docs-link-absolute - See documentation] Absolute link 'https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/availability-performance/node-not-ready-basic-troubleshooting' will be broken in isolated environments. Replace with a relative link.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit 68d79e2:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ⚠️Warning Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md

  • Line 29, Column 165: [Warning: file-not-found - See documentation] Invalid file link: '../create-upgrade-delete/error-code-invalidparameter'.
  • Line 32, Column 190: [Warning: file-not-found - See documentation] Invalid file link: './cluster-node-virtual-machine-failed-state'.
  • Line 34, Column 307: [Warning: file-not-found - See documentation] Invalid file link: '../../virtual-machine-scale-sets/deploy/vmss-outbound-connectivity-not-enabled'.

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit cfc0274:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ✅Succeeded
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit 3369d8d:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ✅Succeeded
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit 87efc83:

⚠️ Validation status: warnings

File Status Preview URL Details
support/azure/azure-kubernetes/welcome-azure-kubernetes.yml ⚠️Warning Details
support/azure/azure-kubernetes/availability-performance/node-auto-repair-errors.md ✅Succeeded
support/azure/azure-kubernetes/toc.yml ✅Succeeded

support/azure/azure-kubernetes/welcome-azure-kubernetes.yml

  • Line 128, Column 18: [Warning: file-not-found - See documentation] Invalid file link: './node-auto-repair-errors.md'.

For more details, please refer to the build report.

Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them.

For any questions, please:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants