Implementing a recourring rollback practice

Implementing a recurring rollback practice

In Q2 FY22 we implemented tooling and process to support rollbacks on Staging and Production. Throughout Q3 we’re continuing to improve reliability of Assisted Rollbacks as a path towards Hands-off Deployments.

Running regular practice rollbacks on Staging will help to build confidence in the tools and process. Suggested Staging Rollback sessions have been added to the release issue template. Release Managers should attempt to schedule rollbacks spread across the team over the course of a release issue.

Document

When peforming the procedure below, make every attempt to document the before and after snapshot of what staging looked like, and link to the various pipelines and results of chatops commands. Doing so enables us to ensure that the procedure works as desired and provides a snapshot of time in the case we see a failure.

  1. Document/Screenshot the version running noted at https://staging.gitlab.com/help
  2. Screenshot the results of all chatops commands run in the procedure
  3. Link to the pipeline created for the rollback
  4. Document/Screenshot the result of the version after the rollback is complete
  5. Because this is a practice session, there are multiple options for bringing staging back into alignment or continuing forward, ensure to document the chosen method and the reason why.

Process

  1. On the suggested date Release Managers should plan a Staging rollback, or schedule for a suitable date if needed.
  2. A suitable time to run should be considered:
  • Choose a time that is in between deployment pipelines if able
  • Validate there is enough time between starting the session and when gstg-cny would receive a new package to prevent QA interference
  1. A suitable package should be identified using the runbook.
  2. Rollback timing is decided by the Release Manager. Rolling back a suitable package once it has started to deploy to Production can help minimise the impact a test rollback will have on deployments and MTTP.
  3. Follow the runbook for steps to complete the rollback.
    • Check the accuracy of the rollback pipeline. It should include:
      • Preparation and tracking
      • Kubernetes Deployment job
      • QA tests
      • Manual Gitaly and Praefect jobs
    • It should not include:
      • Post-deploy migration jobs
  4. Please open issues in the Delivery issue tracker for problems or improvements following the test.
  5. Roll forward to the package deployed prior to the rollback (essentially “undoing” the rollback) to keep consistency with Production.