Incidents

Incidents

An incident can be thought of as a problem or an issue that needs to be addressed and resolved. Incidents trigger on a service, which prompts notifications to go out to on-call responders per the service's escalation policy.

Incident States

  • Triggered - Upon receiving an event, an active service (with someone on-call at that moment, and not disabled or in maintenance mode) will create a new triggered incident. This incident will then escalate as defined by the service's escalation policy. By default, notifications are sent when an incident is triggered, but not when an incident is acknowledged or resolved (to be notified of those state changes, individual users can configure status change notifications, or you can use webhooks).
  • Acknowledged (abbreviated as Ack'd) - Wen an incident is being worked on but has not yet been resolved. The acknowledging user claims ownership of the issue, halting the escalation process. Once an incident is acknowledged, notifications will not be sent to the on-call user until the Incident Ack Timeout is reached (with the exception of acknowledged incidents open for 24 hours or longer; these will generate a daily "reminder" notification). Once the Incident Ack Timeout is reached, the incident will go from the acknowledged state back to the triggered state and notifications will be sent again; the escalation process is "un-paused" and continues as defined beyond that point.
  • Resolved - When you have fixed the issue and want the incident to be closed out. Once an incident is resolved, no additional notifications will be sent and the incident cannot be triggered again. If "triggered" marks the beginning of an incident, "resolved" marks the end.

Incident Lifecycle

1. Received through Services

PagerDuty receives events from monitoring systems through integrations. An integrated monitoring tool's event creates an alert and a correlating incident in PagerDuty. There are several integration types that can accept different inputs, are capable of some de-duplication and, in the case of a new incident, will contact an escalation policy.

Not all incoming events need to result in a notification or be assigned to a user. With alerts and incidents and suppression, data can be sent to your PagerDuty account and collected without being assigned to users or notifying anyone.

2. Assignment via Escalation Policies and Schedules

Unlike an alert or a suppressed event, an incident must be assigned to a user. Assignment is done via an escalation policy associated with the integration. Escalation policies are levels of schedules and/or users that an incident will escalate through if it isn't acknowledged or resolved quickly. Schedules are customizable calendars of who is on-call, and when. An incident will escalate through the layers of an escalation policy until it finds someone who is on-call. This user will be notified and the incident will be assigned to them. If the user fails to acknowledge the incident before the time limit set on the escalation policy, the incident will continue to escalate.

3. Notifications via Phone, SMS, Email, or Push Notification

Each user configures how they would like to be notified in their user profile. PagerDuty will contact the user by the indicated notification rules until the incident is acknowledged, resolved, or escalated, either manually or due to escalation timeout.

4. Acknowledging and Resolving

Notifications provide a vehicle through which to acknowledge (ack) that an incident is being addressed or resolved. Depending on their user role permissions, users who are not currently assigned to an incident may acknowledge or resolve an incident via the incidents dashboard in the web application.

If a service is using alerts triage, alerts cannot be acknowledged, only triggered or resolved. If all alerts in an incident are resolved, the incident will be resolved. Conversely, when the incident is resolved, all alerts under that incident are also resolved.

Resolving an incident closes the incident, whereas acknowledging only halts the escalation process. If the incident is not resolved by the service's acknowledgement timeout period, it re-enters the escalation chain.

Services can also be configured to automatically resolve incidents through the Auto-resolution option.

Incident Redaction

This action is only available to Account Owners. Redaction cannot be undone, not even by PagerDuty Support.

In the event that an incident contains sensitive information, the Account Owner can hide the incident's details by using the Redact Incident button.

After confirming that you would like to redact an incident’s name and details, it will be updated permanently to show who redacted the data and when.

Triggering an Incident with Web UI, Email, or API

There are multiple ways to trigger an incident within PagerDuty. From the PagerDuty web UI, you can trigger an incident from inside a service of any type. Alternatively, as PagerDuty basically has two types of services - email and API - you can trigger an incident according to their respective rules.

Please note that for an incident to be triggered, there must be a user on-call for the service's escalation policy. If there is no one on call for an escalation policy, the incident will not be triggered.

Manually Opening an Incident

Manually opening an incident in a PagerDuty service will trigger a PagerDuty incident. That person will then be notified based on their configured notification rules. Typically, you would use this function if you were testing your notification rules, or if you wanted to contact the on-call person to let them know that there is an issue with a particular service.

There are a few places in PagerDuty where you can trigger an incident:

  1. On the Incidents page, click Create new Incident.
  1. On the Services page you can trigger an incident by clicking New Incident.

When the Create Incident dialog is shown, you can optionally choose a specific escalation policy or individual user. This overrides who the incident will be assigned to. Instead of notifying the escalation policy that is set for the PagerDuty service, the selected escalation policy or individual user will be notified.

When you create a manual incident and assign it to yourself, PagerDuty does not notify you. The incident is also automatically moved to the Acknowledged state since it is implied that you are aware of the incident and working to resolve it.

Send an Event to an Email Integration

If you have a Generic Email integration set up in your account, you can trigger an incident in PagerDuty by sending an email to the specified integration email.

The email address is displayed in the Integration key field under the Integrations tab on the service's page. To view this, navigate to Configuration and click on Services, then click on the service you'd like to view and the Integrations tab within that service.

When you send an email to the integration email address, an incident will be triggered in your service. You can view the trigger in the Incidents tab.

Send an Event through the API

By sending a POST request to a generic events API integration with your integration key and the necessary parameters, you can trigger an incident on a service containing an API integration.

If you have configured an API integration, you can see the relevant API documentation for detailed instructions on creating an incident via the Trigger Events endpoint.

There are also code samples in Ruby, Python, and PHP available, which includes examples to trigger incidents after substituting your details.

Where is incident number ___?

As PagerDuty and our customers grow, we've had to make some adjustments to keep things running smoothly. Historically, we made sure that incidents in your account started numbering at incident #1, from day 1, and never skipped a number…ever.

Unfortunately, guaranteeing chronological incident numbers means that sometimes, we can't create new incidents quickly enough. To address this, we've made a small change:

Going forward, you might notice some "missing" incident numbers in your incidents list. We don't delete your incident numbers, so if you do see a skipped number, you'll know that the number was skipped at the time of incident creation.

This shouldn't happen often, and it doesn't indicate a problem; in fact, it means our system is doing its job!

Incidents