Context
We as engineers strive to build bug free products, however, bugs are unavoidable due to the complexity of addressing all user situations and conditions. Even if the product is functioning as intended, users may have different expectations and intentions from the product designer or owner. Fixing bugs is a priority, however, the team must also manage feature requests, operational support, and platform support with limited resources. It is important to strike the right balance between these needs.
Defect triage (aka bug triage) is the process where reported issues are reviewed and prioritized. During triaging, issues are reviewed to ensure they are valid, reproducible, and have accurate information that allows the issue to be resolved and tested. Proper expectations must be established for all reported issues.
Prioritization Principles
The process of prioritization is based on the concept of finding what’s most important and focusing on it. This reduces the cost of context switching. Below are some metrics to consider and factor into your bug prioritization considerations:
Frequency - Bugs that occur more frequently than others should have higher priority. After all, they are probably the ones that affect more users in the most common situations. Severity - This helps evaluate how annoying the bug is. Some of the questions to ask the team are: is there a workaround? If so, how simple is it for the users to execute the workaround? And if not, how much does the bug negatively affect the user experience? Opportunity Cost - The true value of a product is the summation of all of its features; however, some features are more valuable/critical than others. When push comes to shove, we are more likely to focus on features that truly define the product instead of fixing a nice-to-have.
Standardized priorities and triage SLOs
Triage process and SLAs Defining a high level guidance to triage process
Triage SLA
Daily Triage (defined as in before end of day next business day), immediate reaction to new bugs that are marked as P0 by QA team and/or the bug creator Triaged is defined as the correct team was identified, the team has accepted the bug, p0 and p1 bugs are identified, and the status is set to be “Not Started” Overall triage process alignment
All new issues are triaged daily to filter out release-blocking issues and other critical issues that should have immediate engineering attention. First level bug triage is done via QA team to filter out release-blocking issues to ensure they get actioned immediately and unblock the release (P0 bugs) and critical issues that should have immediate engineering attention (P1 bugs). They will also ensure all other bugs are assigned to a team for further triage. Bugs that are not release blockers are left to be triaged by assigned teams This should be done by the product dev teams (some combination of PMs and Eng). All teams are responsible for triaging bugs assigned to their team in a frequency that upholds the triage SLA. They should use the as guidance. During triage, if the team does not feel that they are the right team to address the bug, they should immediately re-assign the bug to the team they believe can resolve the problem. During triage, if the team discovers a release blocker that was not previously identified during first level bug triage, they should follow the escalation process to ensure the release team is aware of the new release blocker. The assumption is that the likelihood of release blockers found at this stage is slim since triage at this level won’t be done on a daily basis. Therefore, I added an escalation path in case there is a serious bug that warrants an emergency push.