When an incident hits, the first question every team asks is: "How bad is this?" The answer to that question determines everything that follows -- who gets paged, how fast you respond, what gets communicated externally, and how many resources you throw at the problem.
Without a clearly defined severity classification system, teams default to gut instinct. That leads to two equally dangerous outcomes: under-escalating a critical issue (and discovering the damage hours later) or treating every anomaly as a five-alarm fire (and burning out your on-call engineers in weeks).
This guide walks through how to define incident severity levels from SEV-1 through SEV-5, set response time targets for each tier, build escalation policies that actually work, and avoid the most common classification mistakes.
Why Incident Severity Levels Matter
Incident severity levels are a shared language. They let an on-call engineer in one time zone communicate the urgency of an issue to a VP of Engineering in another -- without a five-minute explanation. When everyone agrees on what "SEV-1" means, the right people mobilize at the right speed.
Without that shared language, organizations suffer from:
- Inconsistent response times. One team's "critical" is another team's "we'll look at it Monday."
- Escalation delays. Nobody wants to be the person who wakes up the CTO for a non-issue, so genuine emergencies get under-reported.
- Alert fatigue. When everything is urgent, nothing is urgent.
- Poor post-incident analysis. You cannot measure mean time to resolution (MTTR) by severity if severity is not consistently applied.
A well-defined severity framework gives your team the confidence to act decisively when it matters and to stay focused when it does not.
Incident Severity Levels Defined: SEV-1 Through SEV-5
The following framework defines five severity tiers. Your organization may use P1-P5 (priority levels) or a different naming convention -- the labels matter less than the criteria behind them.
SEV-1 -- Critical
Definition: Complete service outage, active security breach, or data loss affecting all or most users. The business is losing revenue, customers cannot perform core functions, or sensitive data is actively being exfiltrated.
Examples:
- Production database is down and unrecoverable from replica
- Ransomware is actively encrypting production systems
- Authentication service failure preventing all user logins
- Payment processing is completely non-functional
- Confirmed exfiltration of customer PII or financial data
Response target: Acknowledge within 5 minutes. All hands on deck. Incident commander assigned immediately.
Who gets notified: On-call engineer, backup on-call, engineering leadership, VP/CTO, communications lead, and (for security incidents) the security team and legal counsel.
SEV-2 -- High
Definition: Major degradation of a core service or a confirmed security incident with limited blast radius. A significant portion of users are affected, or a critical business function is severely impaired but not fully down.
Examples:
- API response times have increased 10x, causing widespread timeouts
- A vulnerability is confirmed exploitable but no evidence of active exploitation yet
- Email delivery system is down for all outbound messages
- Single-region outage affecting 30% of users
- Unauthorized access to an internal system with no evidence of data exfiltration
Response target: Acknowledge within 15 minutes. Dedicated incident response team assembled.
Who gets notified: On-call engineer, team lead, engineering manager, and security team (for security-related incidents).
SEV-3 -- Moderate
Definition: Partial degradation of a non-critical service, or a potential security concern requiring investigation. Some users are affected, but workarounds exist.
Examples:
- Search functionality is returning incomplete results
- Suspicious login attempts from an unusual geography (not yet confirmed as a breach)
- A non-critical internal tool is down
- Intermittent errors affecting a subset of API requests
- SSL certificate expiring within 72 hours
Response target: Acknowledge within 1 hour. Assigned to an owner during business hours.
Who gets notified: On-call engineer via non-intrusive channel (Slack, email). Team lead informed.
SEV-4 -- Low
Definition: Minor issue with negligible user impact. No immediate business risk, but should be tracked and resolved to prevent escalation.
Examples:
- Cosmetic UI bug on a low-traffic page
- Log volume spike from a misconfigured service (no user impact)
- Minor configuration drift detected in a non-production environment
- A deprecated API endpoint is still receiving occasional traffic
Response target: Acknowledge within 1 business day. Prioritized in the next sprint or maintenance window.
Who gets notified: Ticket created and assigned to the relevant team. No paging.
SEV-5 -- Informational
Definition: No current impact. An observation, improvement opportunity, or risk that should be documented for future planning.
Examples:
- Dependency nearing end-of-life but still supported for 6 months
- Performance benchmark shows gradual degradation trend over 90 days
- Security scan flagged a low-risk finding with no exploit path
- Capacity planning threshold will be reached in the next quarter
Response target: Triaged within 1 week. Added to backlog.
Who gets notified: Logged in the ticketing system. Reviewed during regular planning.
Severity Classification Table
| Level | Label | User Impact | Business Impact | Response Target | Notification |
|---|---|---|---|---|---|
| SEV-1 | Critical | All/most users affected | Revenue loss, data breach, total outage | 5 minutes | Phone call + page all responders |
| SEV-2 | High | Large subset affected | Major degradation, confirmed vulnerability | 15 minutes | Phone call + page on-call |
| SEV-3 | Moderate | Some users, workarounds exist | Partial degradation, potential security concern | 1 hour | Slack/chat notification |
| SEV-4 | Low | Minimal or no user impact | No business risk | 1 business day | Ticket created |
| SEV-5 | Informational | None | Future risk or improvement | 1 week | Backlog item |
How to Build an Escalation Policy by Severity
Defining severity levels is only half the equation. The other half is ensuring the right people are reached through the right channels at the right speed. An escalation policy maps each severity level to a specific notification workflow.
Escalation Tiers
Tier 1 -- Immediate responder. The on-call engineer who receives the initial alert and performs triage. For SEV-1 and SEV-2, this person should be reachable by phone call or SMS within minutes -- not just a Slack message that might go unread.
Tier 2 -- Backup and specialist. If Tier 1 does not acknowledge within a defined window (e.g., 5 minutes for SEV-1), the alert escalates to a backup on-call or a subject matter expert. For security incidents, this tier includes the security team lead.
Tier 3 -- Leadership. For SEV-1 incidents that are not resolved within 30 minutes, or any incident with customer data exposure, escalation reaches engineering leadership and executive stakeholders. This tier also triggers external communication workflows.
Mapping Severity to Channels
The notification channel should match the urgency:
- SEV-1: Automated phone call to primary on-call, followed by SMS. If no acknowledgment within 5 minutes, phone call to backup on-call. Simultaneously notify the incident channel in Slack/Teams.
- SEV-2: Automated phone call to on-call engineer. Slack/Teams notification to the incident channel. If no acknowledgment within 15 minutes, escalate to Tier 2.
- SEV-3: Slack or Teams notification to the relevant team channel. Email to on-call. No phone call unless the issue escalates.
- SEV-4 and SEV-5: Ticket creation. Slack notification to the team channel. No paging.
Automating Escalation with Alert24
Manually managing these escalation workflows -- maintaining on-call schedules, configuring multi-channel routing, and handling time-based escalation -- is error-prone and does not scale. This is where a purpose-built escalation platform becomes essential.
Alert24 lets you define escalation tiers that automatically route alerts by severity. When a monitoring tool fires a SEV-1 alert, Alert24 can simultaneously place a phone call to the primary on-call engineer, send an SMS to the backup, push a notification to the team's Slack or Microsoft Teams channel, and send an email to leadership -- all without manual intervention.
Key capabilities that map directly to the escalation tiers above:
- Multi-channel alerting: Phone calls, SMS, push notifications, Slack, Microsoft Teams, and email. You define which channels fire for which severity level.
- On-call rotation management: Automatically route alerts to whoever is currently on-call, with configurable rotation schedules and override rules.
- Time-based escalation: If the primary responder does not acknowledge a SEV-1 within your defined window, Alert24 automatically escalates to the next tier.
- Integration with monitoring tools: Connect your existing alerting infrastructure (Datadog, PagerDuty, Grafana, custom webhooks) so severity-based routing happens without changing your monitoring stack.
The goal is to ensure that a SEV-1 at 3 AM reaches the right person within minutes, while a SEV-4 quietly creates a ticket for Monday morning review.
Severity Levels and Status Page Communication
Your internal severity classification should drive what gets communicated externally on your public status page. Not every incident warrants a status page update, but your customers should never learn about a major outage from Twitter before they see it on your status page.
When to Update Your Status Page
| Severity | Status Page Action |
|---|---|
| SEV-1 | Immediate update. Post within 10 minutes of confirmation. Regular updates every 30 minutes until resolved. |
| SEV-2 | Update within 30 minutes if customer-facing services are affected. |
| SEV-3 | Update only if users have reported the issue or it affects a visible service. |
| SEV-4/5 | No status page update. |
What to Communicate
For SEV-1 and SEV-2 incidents, your status page updates should include:
- What is affected. Be specific: "Payment processing for US customers" is better than "Some services are experiencing issues."
- Current status. Investigating, identified, monitoring, or resolved.
- Expected next update time. Even if you have no new information, commit to a cadence. Silence during an outage erodes trust faster than bad news.
- Workarounds. If users can take an alternate path, tell them.
Manually updating a status page during a high-stress incident is easy to forget. Alert24 can automatically post and update your public status page when an incident is created or its severity changes, ensuring customers stay informed without adding another task to your incident commander's plate.
Post-Incident Communication
After resolution, post a brief summary on your status page: what happened, what the impact was, how long it lasted, and what you are doing to prevent recurrence. For SEV-1 incidents, a full post-mortem should follow within 48 hours.
Common Mistakes in Severity Classification
Even organizations with well-documented severity frameworks make these mistakes. Awareness is the first step toward avoiding them.
1. Too Many SEV-1s
When the bar for SEV-1 is too low -- or when there is social pressure to escalate everything -- the category loses its meaning. If your team averages more than one or two SEV-1 incidents per month (outside of genuinely unstable periods), your classification criteria are probably too broad.
Fix: Review all SEV-1 incidents quarterly. If more than 30% were downgraded during or after the incident, tighten the criteria.
2. Severity Inflation
Related to the above, but driven by a different cause: teams inflate severity to get faster responses from other teams. If the only way to get the database team to look at a query performance issue is to file a SEV-2, your process has a prioritization problem, not a severity problem.
Fix: Separate severity (impact) from priority (urgency of response). A SEV-3 issue can still be high priority if it is trending toward SEV-1.
3. No Clear Ownership at Each Level
Defining severity levels without defining who owns the response at each level creates a bystander effect. Everyone assumes someone else is handling it.
Fix: Every severity level should have a named role (not just a team) responsible for initial triage and communication. The on-call schedule should make ownership unambiguous at all times.
4. Static Severity That Never Changes
An incident that starts as SEV-3 can become SEV-1 as its scope expands. Teams that treat severity as a one-time classification miss the escalation window.
Fix: Build re-assessment into your incident process. At every 30-minute mark (for SEV-2 and above), the incident commander should explicitly confirm or adjust the severity level.
5. Ignoring the Human Factor
On-call engineers who get woken up for SEV-3 issues that could have waited until morning will eventually start ignoring alerts. Severity levels exist to protect your team's attention and well-being as much as your systems.
Fix: Audit your after-hours pages monthly. If more than 20% were SEV-3 or below, your routing rules need adjustment -- and a platform like Alert24 can enforce time-of-day routing so low-severity issues only page during business hours.
Putting It All Together: A Severity Framework Checklist
Before you consider your incident severity framework complete, verify that you have addressed each of these items:
- Documented definitions for each severity level with concrete examples, not just abstract criteria.
- Response time targets that are realistic and measurable (acknowledge time, not resolution time).
- Escalation policies that map severity to specific notification channels, roles, and time-based escalation rules.
- On-call schedules that ensure someone is always accountable for SEV-1 and SEV-2 incidents.
- Status page guidelines that define when and what to communicate externally based on severity.
- Regular calibration through quarterly reviews of severity accuracy and classification consistency.
- Automation to eliminate manual routing, ensure escalation rules fire reliably, and keep status pages updated.
A severity classification system is not a document you write once and forget. It is a living framework that evolves as your systems, team, and customer expectations change.
Start With the Basics
If you do not have a severity framework today, do not try to build the perfect one. Start with three levels (Critical, Moderate, Informational) and expand as your team develops the operational maturity to manage more granularity. The most important step is agreeing on what "critical" means -- and ensuring that when a critical incident happens, the right people know about it within minutes.
For organizations building out their incident response capabilities, severity classification is the foundation. Without it, even the best-designed incident response plans break down at the point of execution. And for teams that have experienced a breach firsthand, a clear severity framework is often the first thing they implement afterward.
Define your levels. Automate your escalation. Communicate clearly. Everything else follows from there.