A coordinated response is one made by a cross-functional team of incident responders whose primary goal is to efficiently restore service for those impacted. Preparing for an effective coordinated response is essential for efficient operations management, whether your organization is handling a major incident that requires multiple responders, or navigating a complex, lower-impact situation that needs input from multiple parties.
Many of the PagerDuty capabilities referenced in this article are only available to customers on Business, Digital Operations, and Team (legacy) plans. Please contact our Sales Team if you would like to upgrade to a plan with these capabilities.
A major incident is defined as any high-priority incident that requires a coordinated response, often across multiple teams. They are typically highly noticeable by customers, so fixing the problem is of the greatest importance. Major incidents are often referred to as P1, P2, or SEV-1, SEV-2 in most organizations. In PagerDuty Intelligent Dashboards, they are defined as the top two levels of your priority settings, or if multiple responders are added and acknowledge.
The following steps and PagerDuty features are recommended for an effective coordinated response:
- Identify the responders necessary for the incident: The scope of impact, and who is affected, will usually determine who is necessary for response. Most organizations have a set of response teams that are used repeatedly.
- Establish a conference bridge: This will be one or more conference bridge channels in the form of a phone number, a video call, a chat room, or a physical war room where responders will gather to collaborate. It’s important to know whether you will be reusing the same bridge/call/chat/room for all responses, or will be using a new channel for each response.
- Add responders to the incident: Quickly and accurately engage responders to fill the needed roles.
A coordinated response team for a major incident will typically involve the following core roles:
- Incident Commander: Every coordinated response benefits from someone whose entire role is driving the response team’s efforts towards successful incident resolution. In some organizations, this role is referred to as an incident manager.
- Subject Matter Experts (SMEs): SMEs are the responders who are knowledgeable about the systems involved in the incident, and they focus their entire effort toward resolving the underlying issue. They are sometimes referred to as “resolvers,” to distinguish their role from others involved in the response. There are often multiple SMEs for a given coordinated response, and it’s typical that they would be drawn from different technical teams, providing the skills and knowledge necessary for the specific incident at hand.
- Non-Resolver Responders: These are responders with a specialized function outside of the domain of the incident itself. For example, an external communications liaison, an internal communications officer, etc. The specific non-resolver responders needed depends entirely on the incident at hand. For example, an internal-facing incident has no need of an external communications liaison, while a major site degradation may require a full complement.
For more background on these and other roles, refer to PagerDuty’s Incident Response Guide.
For an efficient coordinated response, we recommend establishing a channel where all responders know to gather for collaboration. Some organizations have a persistent conference bridge or chat room that is reused for all major incidents, while others have multiple channels available.
There are two ways to add a conference bridge during a coordinated response:
- Manually Add a Conference Bridge to an Incident via Add Responders
- Automatically Add a Conference Bridge to an Incident with a Response Play
With either method, responders will receive the corresponding notification on their mobile device. Both iOS and Android recognize common phone number formats, so responders can simply tap to dial the conference bridge from their SMS notification. If the Conference Bridge is in the form of a meeting URL, for a video conference or chat channel, this is also tappable from SMS.
Adding responders allows you to receive assistance from additional users with an incident response. Typical reasons for adding responders include SEV-1/P1 responses, critical incident responses, and mobilizing teams.
There are two ways to add responders to incidents:
Adding responders manually gives you the flexibility to choose the exact responders needed for a given situation. However, if you prefer to have a push-button means of mobilizing a response, adding responders with pre-formulated response plays provides this efficient option.
Updated 3 months ago