Scenario: You have 2 or more people (or teams) who are responsible for responding to a service, and you want to make sure that Person B is notified about an incident if the Person A does not respond. How do you set this up in PagerDuty?
The key to setting this up in PagerDuty is by creating appropriate on-call schedules and escalation policies.
In many cases, you will have a group of people on-call during business hours and a different group of people on-call off business hours, or different groups of people on-call at various times of the day.
To notify different users at the same time during various times of the day (or days of the week), then you will want to:
- Create on-call schedules that reflect each person's on-call rotations
- Add each of these schedules to the appropriate layer of an escalation policy
To demonstrate this, we'll use the following example:
- George, Emily, and Jennifer are on-call during business hours 0800-1700 and should all be notified at the same time
- Naomi, Liam, and Max are on-call outside of business hours 1700-0800 and should all be notified at the same time
- On the weekends, both George and Naomi are on-call and should be notified at the same time.
The first step in setting this up is creating on-call schedules for each user reflecting each user's on-call rotations and times.
For example, below are George and Max's schedule. Notice that we restrict the on-call times to certain times of the week for each schedule. Max is only on-call between 1700-0800 on the weekdays:
George is only on-call between 0800-1700 on the weekdays and all day on the weekends:
Once we have created each user's on-call schedule, we can then add each schedule to an escalation policy.
In the screenshot below, we have added each user's schedule (6 in total) to the first layer of an escalation policy. This means that if an incident is triggered between 0800-1700, then George, Emily, and Jennifer will be assigned and notified at the same time, since they are the only users on-call within that escalation layer.
When an incident is triggered between 1700-0800, then Naomi, Liam, and Max will be assigned and notified at the same time, since they are the only users on-call within that escalation layer.
On the weekends, only George and Naomi will be assigned and notified at the same time, since they are the only ones on-call over the weekend in that escalation layer.
Notice that we have set up our escalation policy so that the same group of users are re-notified after 10 minutes if they've not ack'ed or resolved the incident. If the group does not respond 10 minutes after that, then the incident is escalated to their manager, Tony Wagner.
- Create on-call schedules
The first thing you will want to do is create on-call schedules that reflect your current on-call rotations.
In the example below, we have 2 on-call schedules: a Primary schedule and a Secondary schedule. Both schedules have the same rotation (i.e. weekly at 09:00) but with different people on-call. The person from the Primary schedule is our first responder and the person from the Secondary schedule is our back-up in case the first responder does not take action on an incident.
Why create two separate on-call schedules? If I have two people on-call at the same time, wouldn't it make more sense to create one schedule with 2 schedule layers?
No. Schedules are intended to define when a particular user is on call. Only one person can be on-call at any given time within a single on-call schedule.
If you have multiple schedule layers, the bottom layer will always take precedence over the preceding layer(s) so that only one person is on-call at any given time in the on-call schedule. Here is an example demonstrating how this schedule layering looks:
This means that if you want to have more than one person on-call at the same time as someone else, you will need to create separate on-call schedules to reflect these on-call times.
- Create the escalation policy
The escalation policy is where you will establish:
- to whom an incident is assigned immediately after it is triggered.
- how much time you want to give your first responder to take action on the incident.
- when and to whom you want the incident escalated if it is not acted upon.
In the below screenshot, the Primary schedule is the first escalation rule and the Secondary schedule is the second escalation rule. Expected behavior for this escalation policy is as follows:
- When an incident is triggered, it will be immediately assigned to whomever is on-call in the Primary schedule.
- The Primary schedule user has 2 minutes to take action on the incident (i.e. acknowledge, resolve, re-assign)
- If the user on the Primary schedule does not take action on the incident within 30 minutes, the incident will be escalated and assigned to the user on-call on the Secondary schedule.
- If the person on the Secondary schedule does not take action on the incident in 30 minutes, then the incident is reassigned and escalated to the person on the Primary schedule. This is accomplished by ticking the box that says "If no one acknowledges, repeat this policy" and setting the number of policy repeats.
Alternatively, if you would like BOTH of your Primary and Secondary on-call staff to be notified immediately after an incident is triggered, you can utilize multi-user alerting by adding both schedules to the same escalation rule.
You must have a minimum of 5 minutes between escalation rules if you have more than one person/schedule in an escalation rule.