Scenario: You have 2 or more people (or teams) who are responsible for responding to a service, and you want to make sure that Person B is notified about an incident if the Person A does not respond. How do you set this up in PagerDuty?
The key to setting this up in PagerDuty is by creating appropriate on-call schedules and escalation policies.
In many cases, you will have a group of people on-call during business hours and a different group of people on-call off business hours, or different groups of people on-call at various times of the day.
To notify different users at the same time during various times of the day (or days of the week), then you will want to:
- Create on-call schedules that reflect each person's on-call rotations
- Add each of these schedules to the appropriate layer of an escalation policy
- Create on-call schedules
The first thing you will want to do is create on-call schedules that reflect your current on-call rotations.
In the example below, we have 2 on-call schedules: a Primary schedule and a Secondary schedule. Both schedules have the same rotation (i.e. weekly at 09:00) but with different people on-call. The person from the Primary schedule is our first responder and the person from the Secondary schedule is our back-up in case the first responder does not take action on an incident.
Why create two separate on-call schedules? If I have two people on-call at the same time, wouldn't it make more sense to create one schedule with 2 schedule layers?
No. Schedules are intended to define when a particular user is on call. Only one person can be on-call at any given time within a single on-call schedule.
If you have multiple schedule layers, the bottom layer will always take precedence over the preceding layer(s) so that only one person is on-call at any given time in the on-call schedule. Here is an example demonstrating how this schedule layering looks:
This means that if you want to have more than one person on-call at the same time as someone else, you will need to create separate on-call schedules to reflect these on-call times.
- Create the escalation policy
The escalation policy is where you will establish:
- to whom an incident is assigned immediately after it is triggered.
- how much time you want to give your first responder to take action on the incident.
- when and to whom you want the incident escalated if it is not acted upon.
In the below screenshot, the Primary schedule is the first escalation rule and the Secondary schedule is the second escalation rule. Expected behavior for this escalation policy is as follows:
- When an incident is triggered, it will be immediately assigned to whomever is on-call in the Primary schedule.
- The Primary schedule user has 2 minutes to take action on the incident (i.e. acknowledge, resolve, re-assign)
- If the user on the Primary schedule does not take action on the incident within 30 minutes, the incident will be escalated and assigned to the user on-call on the Secondary schedule.
- If the person on the Secondary schedule does not take action on the incident in 30 minutes, then the incident is reassigned and escalated to the person on the Primary schedule. This is accomplished by ticking the box that says "If no one acknowledges, repeat this policy" and setting the number of policy repeats.
Alternatively, if you would like BOTH of your Primary and Secondary on-call staff to be notified immediately after an incident is triggered, you can utilize multi-user alerting by adding both schedules to the same escalation rule.
You must have a minimum of 5 minutes between escalation rules if you have more than one person/schedule in an escalation rule.