Recents
Shortcuts
Folders
Quality tracker
Share
Explore

icon picker
Prioritization guidelines and Triage process

Context

We as engineers strive to build bug free products, however, bugs are unavoidable due to the complexity of addressing all user situations and conditions. Even if the product is functioning as intended, users may have different expectations and intentions from the product designer or owner. Fixing bugs is a priority, however, the team must also manage feature requests, operational support, and platform support with limited resources. It is important to strike the right balance between these needs.
Defect triage (aka bug triage) is the process where reported issues are reviewed and prioritized. During triaging, issues are reviewed to ensure they are valid, reproducible, and have accurate information that allows the issue to be resolved and tested. Proper expectations must be established for all reported issues.

Prioritization Principles

The process of prioritization is based on the concept of finding what’s most important and focusing on it. This reduces the cost of context switching. Below are some metrics to consider and factor into your bug prioritization considerations:
Frequency - Bugs that occur more frequently than others should have higher priority. After all, they are probably the ones that affect more users in the most common situations.
Severity - This helps evaluate how annoying the bug is. Some of the questions to ask the team are: is there a workaround? If so, how simple is it for the users to execute the workaround? And if not, how much does the bug negatively affect the user experience?
Opportunity Cost - The true value of a product is the summation of all of its features; however, some features are more valuable/critical than others. When push comes to shove, we are more likely to focus on features that truly define the product instead of fixing a nice-to-have.

Standardized priorities and triage SLOs

Priority
Criteria
SLO
Timeline
P0
Critical business contract agreements are violated (such as SLAs, RTO, RPO, etc)
Core business features are completely broken for all users and/or select critical impact users
Core business feature breakage that is blocking big marketing pushes/launches
Security defects involving
known attacks that impacts the availability or confidentially of customer data or compromise of our platform
accidental leakage of information or accidental leakage of customer data.
Immediate response, even after hours. Fix ASAP. If multiple days of work, update every day.
These bugs are release blockers
If these bugs are already in Prod, their fix warrants an emergency hotfix push
Immediate
P1
Core business features are significantly impacted
Core business features are degrading and might soon impact all users and/or select critical impact users
Core business features are degraded for a significant number of users
Security defects that can potentially
impact the availability of our entire service (DoS vulnerabilities)
disclose user data to unauthorized users (Information Disclosure across users/tenants)
compromise our service (Elevation of Privilege)
Should be investigated and mitigated within 1 week
If these bugs are already in Prod, they might not warrant an emergency hotfix but the fix should go into the next available release
1 week
P2
Business features are impacted but not hindered for a large number of users
Get it fixed within 6 weeks but not interrupting current work
6 weeks
P3
Functionality issues that are not working as intended (such as cosmetic errors)
We seriously aim to get it fixed within 6 months.
OR
This is a real bug but we will not be able to prioritize it. Immediately mark as Will not Fix (Closed) during triage
6 months
There are no rows in this table

Triage process and SLAs Defining a high level guidance to triage process

Triage SLA

Daily Triage (defined as in before end of day next business day), immediate reaction to new bugs that are marked as P0 by QA team and/or the bug creator
Triaged is defined as the correct team was identified, the team has accepted the bug, p0 and p1 bugs are identified, and the status is set to be “Not Started”

Overall triage process alignment

All new issues are triaged daily to filter out release-blocking issues and other critical issues that should have immediate engineering attention.
First level bug triage is done via QA team to filter out release-blocking issues to ensure they get actioned immediately and unblock the release (P0 bugs) and critical issues that should have immediate engineering attention (P1 bugs). They will also ensure all other bugs are assigned to a team for further triage.
Bugs that are not release blockers are left to be triaged by assigned teams
This should be done by the product dev teams (some combination of PMs and Eng). All teams are responsible for triaging bugs assigned to their team in a frequency that upholds the triage SLA. They should use the as guidance.
During triage, if the team does not feel that they are the right team to address the bug, they should immediately re-assign the bug to the team they believe can resolve the problem.
During triage, if the team discovers a release blocker that was not previously identified during first level bug triage, they should follow the escalation process to ensure the release team is aware of the new release blocker.
The assumption is that the likelihood of release blockers found at this stage is slim since triage at this level won’t be done on a daily basis. Therefore, I added an escalation path in case there is a serious bug that warrants an emergency push.

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.