Intelligent Alert Grouping

Automatically add incoming alerts to related incidents via the Intelligent Alert Grouping algorithm

Intelligent Alert Grouping uses a real-time, machine learning-based algorithm to group related alerts into a single, open incident. This is particularly helpful for incident responders, as it reduces the amount of noise they must contend with and allows them to focus on the task at hand. Over time, the grouping algorithm will adapt to understand new types of alerts and react to human behavior, thereby improving the accuracy of its grouping decisions and further reducing incident resolution times.

📘

Availability

This feature is available with our PagerDuty AIOps add-on, or with Legacy Event Intelligence. If you would like to sign up for a trial of PagerDuty AIOps features, please read PagerDuty AIOps Trials.

🚧

Required User Permission

Users with the following roles can edit a service’s Alert Grouping settings:

  • Account Owner
  • Admin and Global Admin
  • User
  • Manager base role and team roles
    • Manager team roles can only manage services associated with their team.

Enable Intelligent Alert Grouping

📘

Prerequisite

In order to be eligible for Intelligent Alert Grouping, a service must have alerts enabled. If the service is configured to only create incidents, the Intelligent Alert Grouping option will not be available. Read more about enabling alerts on a service in the Alerts article.

To enable Intelligent Alert Grouping:

  1. Navigate to Services Service Directory and select the name of your desired service.
  2. Select the Settings tab and click New Grouping in the section Reduce Noise.
  3. Select Intelligent.
  4. Select the desired grouping time window for alerts on the service. The Recommended time window indicated in the dropdown uses historical service data to calculate the average time between alerts.
  5. Click Save Settings.
Intelligent Alert Grouping on a service

Intelligent Alert Grouping on a service

📘

Scope

Intelligent Alert Grouping will only look at alerts on a single service, and will not group alerts from other services. It will, however, consider alerts sent to other integrations, if any exist, on the same service.

View Intelligent Alert Grouping on an Incident

When enabled, you can see Intelligent Alert Grouping actively grouping alerts on an incident’s detail page under the Alerts tab. The Grouping Now label indicates that an incident is using alert grouping. You can also see how many alerts are grouped into the incident, as well as their status. In the example below, two alerts have been grouped: one is triggered and the other is resolved.

View Intelligent Alert Grouping

View Intelligent Alert Grouping

Select Alert grouping details to see which grouping method is in effect (Intelligent, Content-Based or Time-Based Alert Grouping), when grouping started, and the conditions when grouping will stop.

Alert grouping details

Alert grouping details

Disable Intelligent Alert Grouping

To select a different grouping method, or to disable Alert Grouping all together, in the web app:

  1. Navigate to Services Service Directory and select the name of your desired service.
  2. Select the Settings tab and click Edit next to Reduce Noise.
  3. In the bottom-left, click Delete.
  4. In the confirmation modal, click Yes, turn off.

Algorithm Behavior

The Intelligent Alert Grouping algorithm is built to observe real-time alert data and incident history, and adapt as new alerts trigger on a service. After you have enabled Intelligent Alert Grouping on a service, no explicit configuration is required, though you may optionally configure the Flexible Time Window.

Intelligent Alert Grouping will group an alert into an existing incident when the following criteria are met:

  • The most recent alert was created within the specified grouping time window.
    • This works on a rolling basis, i.e., we will compare the timestamp on the alert in question to the most recently grouped alert.
  • The incident is less than 24 hours old.
  • The Intelligent Alert Grouping algorithm deems the alerts similar.

Alerts that do not meet these criteria will not be grouped, and will trigger a new incident.

The algorithm also reacts to feedback from you and your team — the best way for the algorithm to learn and adapt to new grouping behaviors is to manually merge incidents that are related, and to manually move alerts to a different incident when they are not related. For more information about moving alerts from one incident to another, please see Merge Incident. Alert titles can also be updated automatically using Event Orchestration, which influences the algorithm.

📘

Tip

Merging/unmerging alerts through the API will not factor into the Intelligent Alert Grouping algorithm. Only manual merges and unmerges in the web app influence the algorithm.

The algorithm interprets and adjusts to new alert data or behavior on a service quickly. We strongly recommend against sending in test data to try and influence the algorithm, as this can result in unpredictable behavior.

Flexible Time Window

You can configure the grouping time window on each service. You will be shown a Recommended time window, which is calculated from the average time between alerts using historical service data. Please note, the larger the grouping time window, the higher the chance that you'll run into overgrouping, i.e., an unrelated alert is grouped into an incident. After increasing the time window, we recommend that you monitor alert grouping on the service for a time. You can adjust the time window as needed based on the alert grouping's accuracy.

👍

Best Practice

For critical services, we recommend using the standard five-minute grouping window unless you are not seeing a satisfactory reduction in noise, or the service owner deems it appropriate. Alternatively, you should gradually increase the grouping time window, paired with close user monitoring, to make sure that only the desired alerts are grouped.

FAQ

Can you enable the grouping time window via API?

Expand

Yes, the time window can be enabled using the Update a service API endpoint and the time window can be specified in seconds up to 3600 seconds. You can enable the recommended time window by setting the time window to zero, i.e., ”time_window”: 0.

Can you retrieve the grouping time window via API?

Expand

Yes, you can retrieve the grouping time window using the Get a service API endpoint, i.e. alert_grouping_parameters.config.recommended_time_window.

Does the recommended time window update dynamically once enabled?

Expand

Whenever you view or edit a service's recommended time window in the web app or via the Get a service API endpoint (i.e., alert_grouping_parameters.config.recommended_time_window), PagerDuty will recalculate the recommended value in order to show you the latest value. However, the grouping time window will remain static unless changed manually by a user.

Can we expose the machine learning-based model via the API?

Expand

No, not at this time.

Can we plug our own machine learning code into PagerDuty?

Expand

No, not at this time.

Does this take into account some of the rules or correlations we have configured outside of PagerDuty?

Expand

No, this model is entirely based on actions taken within PagerDuty.

Does it affect the machine learning capabilities if I rename the service?

Expand

No, it does not.

Can Intelligent Alert Grouping group alerts together from multiple services?

Expand

Intelligent Alert Grouping only looks at alerts from a single service. If you want alerts from different services to be grouped, you may need to reconfigure your service so that all related alerts are sent to the same service. If responders would like more context on incidents happening across other services, please read our article on the Related Incidents feature.

Why didn’t my alerts get grouped together?

Expand

There are three main reasons the Intelligent Alert Grouping algorithm may not have grouped alerts on the same service:

  1. The alerts did not arrive within the time window specified for that service.
  2. The incoming alert data was not similar enough to desired alerts or was more similar to the alerts it was grouped with.
  3. Human response behavior via merging or moving alerts out of incidents overrode the desired behavior.

The Intelligent Alert Grouping algorithm takes into consideration several different factors, which makes understanding why alerts are grouped or not difficult to trace. If you believe that there has been enough history for an alert to be grouped, but are still noticing some unexpected grouping behavior, please reach out to our Support team and send links to specific incidents/alert groupings and summarize why the grouping behavior is unexpected.

Why don’t I see any alert grouping options?

Expand

There could be a few reasons why you don’t see any options for Alert Grouping:

  • If you do not see an one option for automatic grouping:
    It’s possible your current pricing plan does not support Alert Grouping. If you are interested in trying Alert Grouping, contact our Sales team to start a free PagerDuty AIOps trial.
  • If you see a message that the service is configured for incidents only:
    This means that your service is able to create alerts, but it is not configured to do so. Please see Alerts for more information about how to adjust this behavior on a service.
  • If you see a message that the service has integrations that do not support alerts:
    Some monitoring tools do not support creating alerts, and therefore any services with these tools integrated will not be able to take advantage of Alert Grouping. To enable alerts on a service, you need to remove the integration that does not support alerts. For a list of integrations that do not support alerts please see this article.

Is there a limit to how many alerts can group into a single incident?

Expand

Yes, incidents are limited to 1000 alerts each. After this limit is reached, a new incident will be created and subsequent alerts will be grouped into the new incident.

Are incidents resolved only when all alerts within that incident are resolved?

Expand

Yes, that is correct. An incident will resolve when all of its associated alerts are resolved. Similarly, if you resolve an incident, PagerDuty will automatically resolve any associated, triggered alerts. For more information, please see Resolve Alerts.