Resolving QA failures

Overview

This document describes the process to follow if QA tests are failing.

QA smoke tests are run as part of the auto-deploy pipeline - this means they are run regularly and can be assumed as stable.

Process Overview

Failing QA tests are always tracked using an issue in the release tracker to give a record of the failure. This is a change away from incidents as part of https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/752

Quality are here to help us investigate and resolve any failures and maintain a schedule to show who to ping. All test results are posted in the #qa- Slack channels.

Process steps

  1. Follow the steps in the (Handling Deploy Failures runbook)[https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/deploy/failures.md#handling-deployment-failures] to create an issue in the release issue tracker
  2. Check the qa-<env> Slack channel, the failure may already be known and being worked on
  3. To escalate, simply ping the engineer listed in the Quality on call schedule and ask for assistance on the issue. The #quality Slack channel can also be used.
  4. If the failure appears to be code or environment-related, declare an incident with the correct (availability severity)[https://about.gitlab.com/handbook/engineering/quality/issue-triage/#availability] and (Delivery impact)[https://about.gitlab.com/handbook/engineering/releases/#delivery-impact-labels] label. Link to the release issue created in step 1.
  5. Once tests are passing again update the issue with a summary of the failure. Apply the deploys-blocked-gprd::* and deploys-blocked-gstg::* labels and close the issue

Quality on call