Rollback a deployment
Overview
This document describes the procedure to rollback a GitLab.com deployment by re-deploying a previous package. Rollbacks can help recover from incidents caused by code changes.
Rolling back
1. Gather Package information
Determining if it is safe to rollback is currently determined by a ChatOps command, but a manual procedure is described below.
To determine if it’s safe to rollback, run the following ChatOps command:
/chatops run rollback check <ENV>
Where ENV
is either:
gstg
- Staginggprd
- Production
The command will validate if the previous package is newer than the latest execution of the post-deploy migration pipeline, if it is, the rollback can proceed.
Example Commands
To check Production:
/chatops run rollback check gprd
To check Staging:
/chatops run rollback check gstg
The output shows the following information:
- If it thinks it is safe to roll back
- If a deployment is in progress
- The name and link to the new, current, and previous deployed commits and a comparison link.
The package name we want to roll back to is the “previous” package name
displayed in the output from the rollback check
command above. Copy it to pass
to the deploy
command later in this procedure.
⚠️ If a deploy is considered in progress, the command will incorrectly state which package to roll back too. You’ll want to use the “current” package name, and use the manual procedure noted below. ⚠️
⚠️ If we rollback one (staging or production), we should ensure the other has a matching package deployed, see “Ensure consistency between Staging and Production”. ⚠️
2. Notify the EOC
Notify @sre-on-call
in #production
that a rollback is about to be started.
If the rollback is going to be performed on Production make sure they know that
Canary will also be drained.
3. Perform the rollback
Production rollbacks require us to first drain Canary:
/chatops run canary --disable --production
Initiate a rollback using chatops. Using the package name information found in step 1 above, to roll back (without the production flag, deployer will operate against staging):
/chatops run deploy <PACKAGE NAME> <ENVIRONMENT> --rollback
Both the package name and the environment are required, but can be in any order.
An alternative manual procedure is described below in this document.
Example Commands
Roll back Staging to a previous package:
/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c gstg --rollback
Roll back Production to a previous package:
/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c gprd --rollback
4. Monitor the <environment>-prepare
CI Job
During the rollback monitor the <environment>-prepare
CI job for a potential
failure. If it becomes stuck in a loop while waiting for the Ominibus lock,
perform the following steps:
Re-verify that no other deployment job is in progress
- If a new deployment is in progress, evaluate the situation to determine if it should be cancelled, or if it is safe to move forward.
If no new deployment is in progress, unlock the environment via ChatOps:
/chatops run deploy unlock <ENVIRONMENT>
5. (Optional) - gitaly and praefect
By default the rollback pipeline will not downgrade gitaly and praefect.
If a rollback is desired, a final stage with manual jobs is provided as part of the rollback pipeline. This optional stage can run as soon as the rest of the web-fleet stage is completed. Always rollback Praefect before Gitaly.
6. Ensure consistency between Staging and Production
With the introduction of multiple staging environments, Staging and Production versions are meant to be kept in sync.
This may involve canceling an ongoing deployment to the other environment, or rolling it back to the matching version.
Example Commands
In the following example, we have already rolled back production, so we need to also rollback staging
## This was done previously
## /chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback --production
/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback
In the following example, we have already rolled back staging, so we need to also rollback production
## This was done previously
## /chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback
/chatops run deploy 13.9.202102091820-985b57c4ca9.7ad6df8e35c --rollback --production
After a rollback
Once the rollback pipeline completes an “
/chatops run auto_deploy status
The release management dashboard will also update. Note, it may take several minutes for the new version to be displayed.
Following a rollback the cause of the problem will still need to be fixed. The exact process of this may differ based on the reason for the rollback, but in most cases this will involve asking the appropriate developers to open a merge request to revert or fix the problem.
The rollback is now complete
(Optional) - In case of ChatOps failure
In case of chatops failure follow the manual steps to run a rollback. The purpose of it is to find the auto-deploy package recorded at the moment of the post-deploy migration (PDM) pipeline execution and compare it against the package we need to rollback to.
1. Determine the auto-deploy packages.
Determine the auto-deploy package you want to rollback and the auto-deploy package that was recorded by the post-deploy migration (PDM) pipeline.
Determine the auto-deploy package to rollback to
First, open the environment page of the environment you wish to roll back:
Next, determine what deploy you want to roll back. In almost (if not all) cases
this will simply be the latest success deploy recorded. Example: the current deploy
is deploy #10472
, which deployed commit 0d272cb0
Then determine the auto-deploy package associated with the GitLab commit. On the [#announcements
]
Slack channel search for the auto-deploy package associated:
In the above, the auto-deploy package is 15.3.202207271120-0d272cb0dfd.cda64a8223f
.
2. Determine the auto-deploy package recorded by the PDM
Determine the auto-deploy package recorded at the moment of the post-deploy migration pipeline execution.
First, open the db environment page of the environment you wish to roll back:
Then, follow the same steps as the previous section: find the commit and then search on Slack the auto-deploy package
3. Compare both auto-deploy packages
Compare the auto-deploy package to rollback to against the auto-deploy package recorded by the PDM, if the first one is newer or the same than the last one, the rollback can proceed.
Example:
- auto-deploy package to rollback to:
15.3.202207271120-0d272cb0dfd.cda64a8223f
- auto-deploy package recorded by the PDM:
15.3.202207261620-d20b7aa07ac.626ce2bed94
In this case, the auto-deploy package to rollback to (15.3.202207271120
) is newer than the one recorded by the PDM
(15.3.202207261620
) so the rollback can proceed.
Manual rollback
To create a rollback pipeline without ChatOps available, create a new pipeline on the deployer
project with the following variables:
key | value | example |
---|---|---|
DEPLOY_ROLLBACK | true | N/A |
DEPLOY_ENVIRONMENT | Environment name | gprd |
DEPLOY_VERSION | Package string | 14.7.202201200320-bbf52b48f4d.3b4ebcefa97 |