Global Alert Grouping
Combine similar alerts into a single incident to reduce notification noise and provide more context across multiple services
Global Alert Grouping reduces noise by grouping alerts across more than one technical service. Global Alert Grouping allows customers to select the scope of services that should be used to evaluate for alert grouping. Global Alert Grouping supports the following methods:
Availability
Global Alert Grouping is available with our PagerDuty AIOps add-on. The feature is also available for the duration of an AIOps Trial. Please contact our Sales team to upgrade to a pricing plan with this feature.
Required User Permissions
Users with the following roles can edit a service’s Alert Grouping settings:
- Account Owner
- Admin and Global Admin
- User
- Manager base role and Team roles
- Manager Team roles can only configure the services for which they are assigned a Manager role.
Enable Global Alert Grouping
Global Alert Grouping can group alerts across multiple technical services.
- In the PagerDuty web app, navigate to Services Service Directory and select the name of your desired service.
- Select the Settings tab and click Edit next to Reduce Noise.
- Select one of the following options and make your desired configurations:
- Click Save.
Intelligent
Selecting Intelligent will enable Intelligent Alert Grouping on your service. There are some differences in functionality depending on whether you select more than one service. The information in this section is relevant when you select more than one service. Please read Intelligent Alert Grouping if you select a single service.
Advanced Options
Advanced Options allow users to customize which alert fields the grouping model analyzes when determining textual similarities, as well as reset the grouping model's learner cache.
- Configure Grouping Fields: Choose up to five fields, including standard Common Event Format (PD-CEF) fields and custom details, for a comprehensive alert grouping analysis.
- Reset Learner Cache: Reset the learner cache.
Configure Grouping Fields
By default, Intelligent Alert Grouping uses alerts' Summary field to determine textual similarity. Depending on your data, however, you may wish to use alternate fields to determine similarity.
In the PagerDuty web app:
- While configuring Intelligent Alert Grouping, click the dropdown Advanced Options.
- In the section Select Fields, select a field from the dropdown you want the Intelligent Alert Grouping model to consider for textual similarity.
- If you select Custom Details, enter a Custom field name.
- Optionally click Add Field and select another field. Note: You can repeat this step as needed to select up to five fields.
- At the bottom of the page, click Save.
Considerations
Summary is the only field that PagerDuty requires. If you do not select Summary as one of the fields, it is possible that all selected fields will be blank. In such cases, no alert grouping will occur because all the selected fields are empty.
When any field that is selected as part of the Advanced Options configuration is blank, those fields will not be analyzed or considered for Intelligent Alert Grouping. The grouping model will only consider the fields with available data.
The selected fields will be analyzed for textual similarity and will not evaluate based on an exact match between fields. To specify exact matching criteria, please use the Intelligent + Alert Content option.
The maximum number of characters for all selected fields is 1,000.
Reset Learner Cache
This option allows you to reset the grouping model's learner caches. This can be useful if a service's data structure changes significantly, or after a period of testing, for example.
In the PagerDuty web app:
- While configuring Intelligent Alert Grouping, click the dropdown Advanced Options.
- Activate the checkbox Reset learner cache.
- At the bottom of the page, click Save.
Algorithm Behavior
The Intelligent Alert Grouping algorithm is built to observe real-time alert data and incident history, and adapt as new alerts trigger on a service. After you have enabled Intelligent Alert Grouping on a service, no explicit configuration is required, though you may optionally configure the Flexible Time Window.
Intelligent Alert Grouping will group an alert into an existing incident when the following criteria are detected by the model. The reason that each alert is grouped will be printed in the incident timeline.
- Textual similarity: The Intelligent Alert Grouping algorithm deems the alerts similar based on alert title similarity
- Past co-occurrences: The Intelligent Alert Grouping model has detected a high rate of co-occurrence between alerts.
- Learning from prior incident merges: The user has merged prior similar alerts and the algorithm has learned from this behavior
Alerts that do not meet these criteria will not be grouped, and will trigger a new incident. When a Global Alert Grouping setting is enabled, Intelligent Alert Grouping will evaluate alerts from all services in a global setting, and will group alerts from one or more services into the same incident when the criteria above are met. New alerts will be grouped under the first incident that is created and notifications are sent to that incident's responder
The algorithm also reacts to feedback from you and your team — the best way for the algorithm to learn and adapt to new grouping behaviors is to manually merge incidents that are related, and to manually move alerts to a different incident when they are not related. For more information about moving alerts from one incident to another, please see Merge Incidents. Alert titles can also be updated automatically using Event Orchestration, which influences the algorithm.
Tip
Merging and unmerging alerts through the REST API will not affect the Intelligent Alert Grouping algorithm. Only manual merges and unmerges in the PagerDuty web app influence the algorithm.
Alert Content
Selecting Alert Content will enable Content-Based Alert Grouping on your service. Content-Based Alert Grouping enables customized alert grouping on services with predictable, homogenous alert data, without the need to train an algorithm. With Content-Based Alert Grouping, alerts that share an exact match on a set of chosen fields will be grouped together into the most recent open incident. Grouped alerts mean fewer incidents and interruptions for responders, richer context on the incidents that do trigger, and lower resolution times
To learn more about the Alert Content option, please read Content-Based Alert Grouping.
Intelligent + Alert Content
Selecting Intelligent + Alert Content will enable Unified Alert Grouping on your service. Unified Alert Grouping combines Content-Based Alert Grouping and Intelligent Alert Grouping with a flexible time window for increased precision and correlation control. Unified Alert Grouping will group alerts when alert content matches and Intelligent Alert Grouping determines alerts are similar. Alerts will group only when both conditions are satisfied.
To learn more about the Intelligent + Alert Content option, please read Unified Alert Grouping.
Flexible Time Window
You can configure the grouping time window as part of the Global Alert Grouping setting. The time window can be between five and 60 minutes. The time window is a rolling window and counted from the most recently grouped alert. The window extends each time an alert is grouped, up to 24 hours, or until the incident is resolved. If an alert comes in after 24 hours, it will trigger a new incident.
View Global Alert Grouping On an Incident
A primary indicator that an incident has alerts grouped from other services is the Multiservice Group pill, along with the description of how many alerts have been grouped, at the top of the incident's details page.
You can also view Global Alert Grouping on an incident's details page in the following places:
- The Impacted Services field will show the incident’s source service, along with the name(s) of any other service(s), whose alerts have been grouped into the incident.
- With the Alerts tab selected, users can also see which service an alert originated on.
- An incident's Timeline will show an entry when Global Alert Grouping adds an alert from another service.
Best Practices
To get the most out of Global Alert Grouping, we recommend the following:
- Make sure that responders on all associated Teams and escalation policies know that their services are part of a Global Alert Grouping configuration.
- Ensure that on-call responders evaluate all alerts, including alerts that originated on other services.
- If an on-call responder is unsure whether an alert should belong to an incident on their assigned service, please advise the on-call responder to add a responder from the originating service to review the alert, or move the alert to create an incident on the originating service. By creating a new incident on the originating service, the appropriate on-call responder will be notified.
- We recommend that users have a base role of Responder or higher for services that are within a multiservice group. Additionally, refrain from using Advanced Permissions to restrict users' access to services that are part of a multiservice group (e.g., restricting users from viewing incidents on a specific service).
FAQ
What service does an incident get assigned to when grouping across services?
Global Alert Grouping will group subsequent alerts into the incident on the service that received the initial alert. Global Alert Grouping will continue grouping matched alerts based on the configured criteria and rolling time window, or until the incident is resolved.
Which escalation policy will be assigned to the incident?
PagerDuty will use the escalation policy from the service where the first alert triggers.
How do I bring in other responders?
You have multiple options:
- The on-call responder can re-assign an incident to other users or escalation policies.
- The on-call responder for that incident can add responders.
- The on-call responder can execute an Incident Workflow that adds responders.
Where can I check if an alert from my service has been grouped into an incident on a different service?
Grouped alerts maintain their originating service, so your team’s alerts will be visible in the Alerts Table when the My Teams filter is selected. They will not appear on the Incidents page when My Teams is selected.
In the Service Directory, the multiservice incident will only be visible with the source service (i.e., the service that received the first alert). This means that a single technical service will show as impacted on your status page. If you have Business Services configured, and there is a priority assigned to the incident, then the business service will also display as impacted.
We are currently evaluating an area on the service directory page to show multiservice incidents when a service has an alert assigned to a multiservice incident.
How do I move an alert that was grouped into an incident to another service?
Read Move Alerts to Another Incident for more information about merging alerts, moving alerts to a new service, and using a grouped alert to trigger a new incident.
What is the maximum number of services that can be in a multiservice group?
You can include up to 250 services in a multiservice group.
Can a service have more than one type of Alert Grouping enabled at a time?
We do not allow a service to have more than one type of Alert Grouping enabled at a time, or to be part of more than one multiservice group, due to the potential conflict between grouping types. If two alerts matched more than one grouping type, PagerDuty would not know which setting should take precedence.
What is the difference between “grouping” and “merging” an alert?
Grouping is an automated process using our alert grouping feature set for automatically grouping alerts into a single incident based on criteria set by a user.
Merging is a manual process that is done by a user where an incident or alert can be merged into a new or existing incident.
How does Global Alert Grouping interact with Event Orchestration?
Event Orchestration is a powerful tool that can manipulate an event's payload before routing it to a service. As their names imply, Event Orchestration acts at the event level (i.e., upstream from Global Alert Grouping), while Global Alert Grouping acts on alerts (i.e., downstream from Event Orchestration). This means that Global Alert Grouping will evaluate an alert as it's received from Event Orchestration, after any transformations in Global Orchestrations or Service Orchestrations have taken place.
Can I use the REST API to configure Global Alert Grouping?
Yes, the following endpoints are documented in our Developer Docs to help you configure Global Alert Grouping:
Updated 11 days ago