Intelligent Alert Grouping
Automatically add incoming alerts to related incidents via the Intelligent Alert Grouping algorithm
Intelligent Alert Grouping uses a real-time, machine learning-based algorithm to group related alerts into a single, open incident. This is particularly helpful for incident responders, as it reduces the amount of noise they must contend with and allows them to focus on the task at hand. Over time, the grouping algorithm will adapt to understand new types of alerts and react to human behavior, thereby improving the accuracy of its grouping decisions and further reducing incident resolution times.
Availability
Intelligent Alert Grouping is available with our PagerDuty AIOps add-on, or with Legacy Event Intelligence. Please note that newer features (e.g., Advanced Options) are only available to PagerDuty AIOps customers, and are not available on Legacy Event Intelligence plans. Please contact our Sales Team to upgrade your account's pricing plan.
If you would like to sign up for a trial of PagerDuty AIOps features, please read PagerDuty AIOps Trials.
Required User Permission
Users with the following roles can edit a service’s Alert Grouping settings:
- Account Owner
- Admin and Global Admin
- User
- Manager base role and team roles
- Manager team roles can only manage services associated with their team.
Enable Intelligent Alert Grouping
To enable Intelligent Alert Grouping:
- Navigate to Services Service Directory and select the name of your desired service.
- Select the Settings tab and click New Grouping in the section Reduce Noise.
- Select Intelligent.
- Select the desired grouping time window for alerts on the service. The Recommended time window indicated in the dropdown uses historical service data to calculate the average time between alerts.
- Click Save Settings.
Scope
Intelligent Alert Grouping will only look at alerts on a single service, and will not group alerts from other services. It will, however, consider alerts sent to other integrations, if any exist, on the same service.
View Intelligent Alert Grouping on an Incident
When enabled, you can see Intelligent Alert Grouping actively grouping alerts on an incident’s detail page under the Alerts tab. The Grouping Now label indicates that an incident is using alert grouping. You can also see how many alerts are grouped into the incident, as well as their status. In the example below, two alerts have been grouped: one is triggered and the other is resolved.
Select Alert grouping details to see which grouping method is in effect (Intelligent, Content-Based or Time-Based Alert Grouping), when grouping started, and the conditions when grouping will stop.
Disable Intelligent Alert Grouping
To select a different grouping method, or to disable Alert Grouping all together, in the web app:
- Navigate to Services Service Directory and select the name of your desired service.
- Select the Settings tab and click Edit next to Reduce Noise.
- In the bottom-left, click Delete.
- In the confirmation modal, click Yes, turn off.
Advanced Options
Advanced Options allow users to customize which alert fields the grouping model analyzes when determining textual similarities, as well as reset the grouping model's learner cache.
- Configure Grouping Fields: Choose up to five fields, including standard Common Event Format (PD-CEF) fields and custom details, for a comprehensive alert grouping analysis.
- Reset Learner Cache: Reset the learner cache.
Configure Grouping Fields
By default, Intelligent Alert Grouping uses alerts' Summary field to determine textual similarity. Depending on your data, however, you may wish to use alternate fields to determine similarity.
In the PagerDuty web app:
- While configuring Intelligent Alert Grouping, click the dropdown Advanced Options.
- In the section Select Fields, select a field from the dropdown you want the Intelligent Alert Grouping model to consider for textual similarity.
- If you select Custom Details, enter a Custom field name.
- Optionally click Add Field and select another field. Note: You can repeat this step as needed to select up to five fields.
- At the bottom of the page, click Save.
Considerations
- Summary is the only field that PagerDuty requires. If you do not select Summary as one of the fields, it is possible that all selected fields will be blank. In such cases, no alert grouping will occur because all the selected fields are empty.
- When any field that is selected as part of the Advanced Options configuration is blank, those fields will not be analyzed or considered for Intelligent Alert Grouping. The grouping model will only consider the fields with available data.
- The selected fields will be analyzed for textual similarity and will not evaluate based on an exact match between fields. To specify exact matching criteria, please use the Intelligent + Alert Content option.
- The maximum number of characters for all selected fields is 1,000.
Reset Learner Cache
This option allows to you reset the grouping model's learner caches. This can be useful if a service's data structure changes significantly, or after a period of testing, for example.
In the PagerDuty web app:
- While configuring Intelligent Alert Grouping, click the dropdown Advanced Options.
- Activate the checkbox Reset learner cache
- At the bottom of the page, click Save.
Algorithm Behavior
The Intelligent Alert Grouping algorithm is built to observe real-time alert data and incident history, and adapt as new alerts trigger on a service. After you have enabled Intelligent Alert Grouping on a service, no explicit configuration is required, though you may optionally configure the Flexible Time Window.
Intelligent Alert Grouping will group an alert into an existing incident when the following criteria are met:
- The most recent alert was created within the specified grouping time window.
- This works on a rolling basis, i.e., we will compare the timestamp on the alert in question to the most recently grouped alert.
- The incident is less than 24 hours old.
- The Intelligent Alert Grouping algorithm deems the alerts similar.
Alerts that do not meet these criteria will not be grouped, and will trigger a new incident.
The algorithm also reacts to feedback from you and your team — the best way for the algorithm to learn and adapt to new grouping behaviors is to manually merge incidents that are related, and to manually move alerts to a different incident when they are not related. For more information about moving alerts from one incident to another, please see Merge Incident. Alert titles can also be updated automatically using Event Orchestration, which influences the algorithm.
Tip
Merging/unmerging alerts through the API will not factor into the Intelligent Alert Grouping algorithm. Only manual merges and unmerges in the web app influence the algorithm.
The algorithm interprets and adjusts to new alert data or behavior on a service quickly. We strongly recommend against sending in test data to try and influence the algorithm, as this can result in unpredictable behavior.
Flexible Time Window
You can configure the grouping time window on each service. You will be shown a Recommended time window, which is calculated from the average time between alerts using historical service data. Please note, the larger the grouping time window, the higher the chance that you'll run into overgrouping, i.e., an unrelated alert is grouped into an incident. After increasing the time window, we recommend that you monitor alert grouping on the service for a time. You can adjust the time window as needed based on the alert grouping's accuracy.
Best Practice
For critical services, we recommend using the standard five-minute grouping window unless you are not seeing a satisfactory reduction in noise, or the service owner deems it appropriate. Alternatively, you should gradually increase the grouping time window, paired with close user monitoring, to make sure that only the desired alerts are grouped.
FAQ
Can you enable the grouping time window via API?
Yes, the time window can be enabled using the Update a service API endpoint and the time window can be specified in seconds up to 3600 seconds. You can enable the recommended time window by setting the time window to zero, i.e., ”time_window”: 0
.
Can you retrieve the grouping time window via API?
Yes, you can retrieve the grouping time window using the Get a service API endpoint, i.e. alert_grouping_parameters.config.recommended_time_window
.
Does the recommended time window update dynamically once enabled?
Whenever you view or edit a service's recommended time window in the web app or via the Get a service API endpoint (i.e., alert_grouping_parameters.config.recommended_time_window
), PagerDuty will recalculate the recommended value in order to show you the latest value. However, the grouping time window will remain static unless changed manually by a user.
Can we expose the machine learning-based model via the API?
No, not at this time.
Can we plug our own machine learning code into PagerDuty?
No, not at this time.
Does this take into account some of the rules or correlations we have configured outside of PagerDuty?
No, this model is entirely based on actions taken within PagerDuty.
Does it affect the machine learning capabilities if I rename the service?
No, it does not.
Can Intelligent Alert Grouping group alerts together from multiple services?
Intelligent Alert Grouping only looks at alerts from a single service. If you want alerts from different services to be grouped, you may need to reconfigure your service so that all related alerts are sent to the same service. If responders would like more context on incidents happening across other services, please read our article on the Related Incidents feature.
Why didn’t my alerts get grouped together?
There are three main reasons the Intelligent Alert Grouping algorithm may not have grouped alerts on the same service:
- The alerts did not arrive within the time window specified for that service.
- The incoming alert data was not similar enough to desired alerts or was more similar to the alerts it was grouped with.
- Human response behavior via merging or moving alerts out of incidents overrode the desired behavior.
The Intelligent Alert Grouping algorithm takes into consideration several different factors, which makes understanding why alerts are grouped or not difficult to trace. If you believe that there has been enough history for an alert to be grouped, but are still noticing some unexpected grouping behavior, please reach out to our Support team and send links to specific incidents/alert groupings and summarize why the grouping behavior is unexpected.
Why don’t I see any alert grouping options?
It’s possible your current pricing plan does not support Alert Grouping. If you are interested in trying Alert Grouping, contact our Sales team to start a free PagerDuty AIOps trial.
Is there a limit to how many alerts can group into a single incident?
Yes, incidents are limited to 1000 alerts each. After this limit is reached, a new incident will be created and subsequent alerts will be grouped into the new incident.
Are incidents resolved only when all alerts within that incident are resolved?
Yes, that is correct. An incident will resolve when all of its associated alerts are resolved. Similarly, if you resolve an incident, PagerDuty will automatically resolve any associated, triggered alerts. For more information, please see Resolve Alerts.
Updated 1 day ago