Monthly Release
The Release Manager should always be working on the release. Most of the work is monitoring the auto-deploy process and watching the appropriate channels of communication for the next steps. Always be cognizant that we may need to pause the auto-deploy process if a P1 issue is announced by an Engineer performing testing.
Process
Create an issue to track the release
To keep track of the various tasks that need to happen each day leading
up to the final release, an issue is automatically created at the beginning of
every release cycle on the release task tracker. The auto_deploy:prepare
pipeline schedule will create the monthly issue
if it doesn’t find one.
If, for some reason, this is not the case, you can create one manually and update it as we progress as follows:
- Using Slack in
#releases
execute:/chatops run release prepare <VERSION>
- Example:
/chatops run release prepare 11.8.0
- Example:
- ChatOps will respond with a job that gets executed as well as a link to the various issues that are created automatically.
This meta issue will serve as the main place where everyone can find issues related to the release you will be working on.
Every time you create a new issue for one of the upcoming tasks, you should link it to this meta issue.
Auto-Deploy Process
Auto-Deploy will create a package each day if necessary. Announcements are posted in the Slack #announcements
channel, when that package has been deployed to a particular environment.
The job of the Release Manager is to monitor the auto-deploy branch and validate there are no pipeline failures. The Release Manager should also ensure that any deploy to any of the environments is both successful and error free. The Release Manager should strive to deploy to
production at minimum once per day but only after it’s been deemed safe to do so.
The Auto-deploy process is documented in much greater detail in the auto-deploy document
Guidelines on Deploy
Deploys to the Staging environment and the Canary stage are automatic after the completion of a successful package build. Deploying to each of these also runs a set of full QA jobs, and a subset of these jobs will fully halt the deployment pipeline from progressing to the next environment. This means that deployment to Canary will not continue if a QA job failed after the Staging deploy. See QA Failures to address situations where QA jobs fail.
A QA issue is opened upon successful deploy to the staging environment. When a QA issue is opened, a listing of issues that made it into said package will be available for Engineers to check off and additionally validate manually if features are working as intended. The QA issue closes after 24 hours automatically, and the Release Managers should only utilize this issue as a guideline and a communication point with Engineers doing the QA testing. Completing the QA issue is not a blocker to progressing deployment to other environments, unless an Engineer informs of a S1/S2 regression.
As auto-deploy moves along, the initial build will always have the most changes, and taper off through the day/week. This creates a few notable situations to make decisions
QA tasks may not be created. This can be one of three things:
- Nothing was picked into the auto-deploy branch that round, which means no GitLab package was created which resulted in no deploy.
- Another situation may be a sign of a problem during the deploy to the staging environment. This should be investigated.
- Some items are excluded from showing up in QA. The version of the product will still change but functionality of GitLab may not. In these circumstances, it’s safe to skip deploying to Production.
We should strive to deploy to production as often as possible. Periodically check our metrics for the Canary environment, look at Sentry for previously unreported errors. If nothing is raising any flags it may be safe to proceed with a deploy.
Promotion to production is a manual step done by the Release Manager. As a guideline wait for 1 hour
It is not necessary to ask for permission to deploy to production when:
- No new errors in Sentry for that release
- Check the link to Sentry release for Canary deployment in the
#announcements
channel.
- Check the link to Sentry release for Canary deployment in the
- Ensure that there are no active S1, active S2 or active S3 incidents ongoing.
- Ensure that there are no change issues with C1 criticality or C2 criticality ongoing.
If all of that appears ok, leave a message in the open release issue with the following content:
Promoting to production because there are:
1. No new exceptions reported in Sentry.
1. No active S1, S2, or S3 incidents.
1. No ongoing change C1, or C2 issues.
Once the comment is added, proceed with production deployment.
If any new exceptions are reported in Sentry, create an issue from it and
escalate to the #dev-escalation
Slack channel to determine the severity of the
problem. If there is an ongoing active incident, confirm in
#incident-management
Slack channel if the incident is being worked on. If
there are no updates there, ping @sre-oncall
for an update. If there is an
ongoing high criticality change issue, enquire if it is OK to also deploy at the
same time by asking @sre-oncall
to leave a comment on the incident or change
issue that is blocking the deploy with a reason as to why a promotion is safe.
The Release Manager should link to the monthly release issue to this comment.
Once this information is gathered we’ve met compliance, and it is safe to
proceed to deploy.
Be wary of times where we hold a deploy into production. The next day we run into a situation where new code that made it to canary never made it to production yet which puts us in a questionable situation as we’ll have more changes that originally intended to land in the next production deploy. We should try to avoid pushing an older version of GitLab into production due to code changes potentially behaving badly due to data migrations.
Use your best judgement to determine if production should move forward. Ask questions to ensure you are comfortable and overly communicate your decision to or not to move forward. In general we want to avoid creating situations where production is behind canary for lengthy periods of time.
Our documentation for how our deployer mechanism works can be found here: gitlab.com/gitlab-org/release/docs/…/general/deploy/gitlab-com-deployer.md#creating-a-new-deployment-for-upgrading-gitlab
Complete the release tasks
Once the release schedule begins, each work day has something that needs to be done. Perform the tasks and mark them as complete in the issue as you progress.
If you’re not sure what to do for any task, check the guides.
Getting Help
Completing release tasks on time is very important. If you experience problems with any of release tasks and you don’t know who to ask then you should contact someone from this list:
The earlier we determine problem or delay in release - the easier it is to fix it.
Priorities
Keep up with the release schedule. It’s better to ship less but on time. Revert code that delays the release.