-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Emergency Runbook
Tabatha D Zeitke edited this page Dec 21, 2022
·
11 revisions
This site is hosted on Gatsby Cloud and is maintained / supported by New Relic's Docs Team team.
- Troubleshooting dashboard
- #help-documentation (for engineering and content requests)
- #doc_eng_bots (alert and deployment updates)
- Alert policy
- Architecture notes
Scenario | Severity | Resolution |
---|---|---|
Site is not loading | ❗ High | Rollback a release |
All localized pages are throwing 500s | ❗ High | Rollback a release |
Functionality is broken | Rollback a release | |
Alert has been triggered | Respond to an incident | |
Copy needs to be adjusted | 👀 Unknown | Ping @hero in #help-documentation or Use leave a comment in Feedback form on the relevant doc page to generate a Jira ticket |
If the site is not loading, or a piece of functionality is broken, you will likely need to rollback to a stable release using the following steps. There are two ways to rollback a release:
- Log into Gatsby Cloud with Github two-factor.
- Select the
docs-website - main
site. - Scroll down to Build history to see all the previous builds that have published.
- Find the appropriate build to roll back to. Click
Publish
to deploy that build of the site.
If you do not have access to Gatsby Cloud, you can perform a rollback using Github:
-
Find the pull request (into
main
) that you would like to rollback. - Click
Revert
to create a new pull request that undoes this work. - Have someone review the rollback and approve the pull request.
- Once the necessary checks have passed, merge into
main
. - A build will be triggered in Gatsby Cloud. Once complete, the rollback will be released.
The following steps are for on-call engineers working at New Relic:
- Don't panic, you've got this!
- Check to see if there is already an ongoing incident in #emergency-room (or in 2, 3, and 4).
- If there is not an ongoing incident, start one by following the steps in the Incident Commander Runbook.
- Refer to the troubleshooting dashboard to get an idea for what could be going on.
- Look at the recent deployments to production to identify a PR that can be reverted.