Rollback a deployment

Overview

This document describes the procedure to rollback a GitLab.com deployment by re-deploying a previous package. Rollbacks can help recover from incidents caused by code changes.

Rolling back

1. Gather Package information

Determining if it is safe to rollback is currently determined by a ChatOps command, but a manual procedure is described below.

To determine if it’s safe to rollback, run the following ChatOps command:

/chatops run rollback check <ENV>

Where ENV is either:

  • gstg - Staging
  • gprd - Production

The command will validate if the previous package is newer than the latest execution of the post-deploy migration pipeline, if it is, the rollback can proceed.

Example Commands

To check Production:

/chatops run rollback check gprd

To check Staging:

/chatops run rollback check gstg

The output shows the following information:

  • If it thinks it is safe to roll back
  • If a deployment is in progress
  • The name and link to the new, current, and previous deployed commits and a comparison link.

The package name we want to roll back to is the “previous” package name displayed in the output from the rollback check command above. Copy it to pass to the deploy command later in this procedure.

⚠️ If a deploy is considered in progress, the command will incorrectly state which package to roll back too. You’ll want to use the “current” package name, and use the manual procedure noted below. ⚠️

⚠️ If we rollback one (staging or production), we should ensure the other has a matching package deployed, see “Ensure consistency between Staging and Production”. ⚠️

2. Notify the EOC

Notify @sre-on-call in #production that a rollback is about to be started. If the rollback is going to be performed on Production make sure they know that Canary will also be drained.

3. Perform the rollback

Production rollbacks require us to first drain Canary:

/chatops run canary --disable --production

Initiate a rollback using chatops. Using the package name information found in step 1 above, to roll back (without the production flag, deployer will operate against staging):

/chatops run deploy <PACKAGE NAME> <ENVIRONMENT> --rollback

Both the package name and the environment are required, but can be in any order.

An alternative manual procedure is described below in this document.

Example Commands

Roll back Staging to a previous package:

/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c gstg --rollback

Roll back Production to a previous package:

/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c gprd --rollback

4. Monitor the <environment>-prepare CI Job

During the rollback monitor the <environment>-prepare CI job for a potential failure. If it becomes stuck in a loop while waiting for the Ominibus lock, perform the following steps:

  1. Re-verify that no other deployment job is in progress

    • If a new deployment is in progress, evaluate the situation to determine if it should be cancelled, or if it is safe to move forward.
  2. If no new deployment is in progress, unlock the environment via ChatOps:

    /chatops run deploy unlock <ENVIRONMENT>
    

5. (Optional) - gitaly and praefect

By default the rollback pipeline will not downgrade gitaly and praefect.

If a rollback is desired, a final stage with manual jobs is provided as part of the rollback pipeline. This optional stage can run as soon as the rest of the web-fleet stage is completed. Always rollback Praefect before Gitaly.

6. Ensure consistency between Staging and Production

With the introduction of multiple staging environments, Staging and Production versions are meant to be kept in sync.

This may involve canceling an ongoing deployment to the other environment, or rolling it back to the matching version.

Example Commands

In the following example, we have already rolled back production, so we need to also rollback staging

## This was done previously
## /chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback --production

/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback

In the following example, we have already rolled back staging, so we need to also rollback production

## This was done previously
## /chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback

/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback --production

After a rollback

Once the rollback pipeline completes an “ finished a rollback to ” message will be posted in the #announcements Slack channel. You can check the rollback’s success by running

/chatops run auto_deploy status

The release management dashboard will also update. Note, it may take several minutes for the new version to be displayed.

Following a rollback the cause of the problem will still need to be fixed. The exact process of this may differ based on the reason for the rollback, but in most cases this will involve asking the appropriate developers to open a merge request to revert or fix the problem.

The rollback is now complete

(Optional) - In case of ChatOps failure

In case of chatops failure follow the manual steps to run a rollback. The purpose of it is to find the auto-deploy package recorded at the moment of the post-deploy migration (PDM) pipeline execution and compare it against the package we need to rollback to.

1. Determine the auto-deploy packages.

Determine the auto-deploy package you want to rollback and the auto-deploy package that was recorded by the post-deploy migration (PDM) pipeline.

Determine the auto-deploy package to rollback to

First, open the environment page of the environment you wish to roll back:

Next, determine what deploy you want to roll back. In almost (if not all) cases this will simply be the latest success deploy recorded. Example: the current deploy is deploy #10472, which deployed commit 0d272cb0

deployment example

Then determine the auto-deploy package associated with the GitLab commit. On the [#announcements] Slack channel search for the auto-deploy package associated:

Slack search example

In the above, the auto-deploy package is 15.3.202207271120-0d272cb0dfd.cda64a8223f.

2. Determine the auto-deploy package recorded by the PDM

Determine the auto-deploy package recorded at the moment of the post-deploy migration pipeline execution.

First, open the db environment page of the environment you wish to roll back:

Then, follow the same steps as the previous section: find the commit and then search on Slack the auto-deploy package

3. Compare both auto-deploy packages

Compare the auto-deploy package to rollback to against the auto-deploy package recorded by the PDM, if the first one is newer or the same than the last one, the rollback can proceed.

Example:

  • auto-deploy package to rollback to: 15.3.202207271120-0d272cb0dfd.cda64a8223f
  • auto-deploy package recorded by the PDM: 15.3.202207261620-d20b7aa07ac.626ce2bed94

In this case, the auto-deploy package to rollback to (15.3.202207271120) is newer than the one recorded by the PDM (15.3.202207261620) so the rollback can proceed.

Manual rollback

To create a rollback pipeline without ChatOps available, create a new pipeline on the deployer project with the following variables:

keyvalueexample
DEPLOY_ROLLBACKtrueN/A
DEPLOY_ENVIRONMENTEnvironment namegprd
DEPLOY_VERSIONPackage string14.7.202201200320-bbf52b48f4d.3b4ebcefa97