Event Management

Infrastructure Health Application

The Infrastructure Health Application allows a view all of your events and alerts across your entire infrastructure in a single view. This holistic view empowers you to understand the complexities of your infrastructure and ultimately improve event management throughout your organization.

The infrastructure aggregates the linear data from the Alerts Tab and Incident Dashboard to offer a visual timeline of events and alerts across your PagerDuty services. Users can identify noisy low value alerts, uncover patterns in data, and juxtapose contextual events, like deploys or tweets, against alerts.

Note

The Infrastructure Health Application can be accessed via the Operations Command Console. If you'd like to enable this feature, please contact our Sales Team.

For the best experience with this application, we recommend expanding the Infrastructure Health Application to fill your screen.

Infrastructure Health Application Features

Click, Drag, Select and Zoom-in

Zoom-in on sections of data for more granularity on areas of interest. Data ranges can also be altered by time duration: 1, 6 and 24 hours; 7, 14 and 30 days. Time duration of data can be configured in the upper right hand corner of the application when the application is fully expanded.

Data Cluster Sizes

The time period represented in the differently-size colored data clusters depends on the time period selected in the application. For the 1–24 hour view, each cluster will include data in 5 minute intervals. For the 7–14 day view, data is clustered hourly. In the 30 day view your data clusters will be segmented into clusters of 6 hours.

Pivot Data On Source or Service

Get a completely new perspective on your data using the Source and Service Pivot. Please note that only integrations that emit our Common Event Format (PD-CEF) will display when pivoting on Source.

Using Filters

In the search bar below these pivot points you may also filter which individual sources or services are represented in your data to fine tune what data is on display. Filtering may be done by text or regex. Clicking on individual services or sources in the list below the search bar allows for filtering as well.

Accessing Cluster Details

Hovering over a cluster of will create a dialog box with details about data in that cluster. Clicking on the cluster will generate a scrollable dialog box with information about the data, including individual events' severity, priority, source, and integration.

Frequently Asked Questions

Do all alerts appear in the Infrastructure Health Application?

Any event that creates a log entry in PagerDuty will appear in the Infrastructure Health Application. However, the Source pivot will only display data that has been migrated to our PagerDuty Common Event Format.

How often does the Infrastructure Health Application update?

Data flows into the Application in real time. If an event is triggered, and the user immediately reloads the Application, the event will appear. The Application also auto refreshes every 1 minute, so the user is not required to reload the page.

How do I bring in contextual events, like deploys or tweets?

If you want to bring more event data into your Infrastructure Health Application, you have many options.

  • We have over 150 out of the box integrations. Take a look through these to find out which will work best for you.
  • You can use our Custom Event Transformer tool to integrate PagerDuty with any tool that can send an HTTP request. You can also use the Customer Event Transformer to edit the JSON payload PagerDuty receives. More info can be found in our CET Dev Docs and CET Integration Guide.
  • See the example use case for more info about setting up the Infrastructure Health Application.

What is the y-axis of the Infrastructure Health Application?

By default, the y-axis will display your PagerDuty Services. Each Service is represented by a swimlane in the y-axis. If you select the Source pivot, the y-axis will update to display all of the unique sources from which your data is coming.

How do I display multiple alerts and events in the same swimlane of the Application?

You can use our services group feature to roll multiple integrations into a single service. This will allow you to visualize all events and alerts happening within the service in a single swimlane in the Infrastructure Health Applications. More info: Services Group Blog Post

How many days of data will the Application Display?

The Infrastructure Health Application displays up to 30 days worth of data.

Example Use Case for the Infrastructure Health Application

Integrating inbound tweets into your PagerDuty Infrastructure Health Application is a great way to track user responses to deploys and outages. Twitter does not support webhooks, so an intermediary tool is necessary in order to send events from Twitter to PagerDuty. You can use a tool like Zapier or Microsoft Flow. The following guide provides an example of how to integrate Twitter with the Infrastructure Health Application using Zapier.

Integration Instructions

In PagerDuty

  1. Set up a new Twitter service in PagerDuty, or add a Twitter extension as a generic webhook to an existing service.
  2. For a new Twitter service, select the custom event transformer integration type as the Integration Type.
  3. We suggest setting the notification urgency for the service as suppressed to prevent alerting for incoming tweets.
  4. On your Twitter service's individual page, select the Integrations tab and click on Custom Event Transformer. Then click Add Integration.
  5. Select Edit Integration.
  6. Delete the Javascript that appears on the Edit Integration page
  1. Here is an example of what can be pasted into the Javascript field. You can map these fields to meet your specific needs. Ensure that event_class is set to “tweet” in order to have Twitter events appear as small blue “T's” in the Infrastructure Health Application interface so that they are easily differentiated from alerts.
var body = PD.inputRequest.body;

var cef_event = {
 event_type: 'cef',
 creation_time: new Date(body.created_at),
 severity: "Info",
 priority: body.retweet_count,
 client: "Twitter",
 client_url: body.url,
 event_class: "tweet",
 message: body.text,
 details: body.user.description,
 source_component: "Twitter",
 source_origin: "Twitter",
 reporter_component: "Twitter",
 local_instance_id: body.id,
 event_action: PD.Trigger
};

PD.emitCEFEvents([cef_event]);

In Zapier

  1. Create a zap in Zapier to fire webhooks from Twitter to PagerDuty. Be sure to choose Twitter when prompted to select a trigger app.
  2. When asked to Select Twitter Trigger, opt for Search Mentions.
  3. On the next screen you will be prompted to enter the search term you are interested in tracking in Twitter.
  4. Next you will be asked to Choose an Action App. Select the Webhooks by Zapier option.
  5. You will be prompted to Choose Action. For this step, click the radio button to select POST: Fire off a single POST request as a form or JSON.

Return to PagerDuty

  1. Go back to your Twitter service's individual service page and click on the Integrations tab. Then click on the Custom Events Transformer.
  2. Copy the URL provided under Integration URL.

Return to Zapier

  1. Paste the PagerDuty URL into Zapier on the Set up Webhooks by Zapier POST page.
  2. Test, and select field.
  3. Click Finish.

Operations Command Console

In order to effectively ensure the continued health of your infrastructure and overall business you need information at the incident, service, responder, and infrastructure contexts at all times. PagerDuty's Operations Command Console is a customizable framework that provides complete operational awareness of the critical elements of your IT infrastructure, allowing you to gain enhanced insights that accelerate incident response.

Accessing the Console

If the Operations Command Console is available to your account all users will see and have access to the Command Console option in the main menu.

Please contact our sales team if you do not see Command Console in the main menu and would like to see a demo or start a trial of the Operations Command Console.

Applications

The following applications are available in the Operations Command Console:

Infrastructure Health

Infrastructure Health is a core component of the Operations Command Console. This application provides a comprehensive visualization of the clusters of alerts and events occurring across your IT infrastructure and delivers infrastructure-wide context that is vital for improving incident management within your organization. With a comprehensive timeline of events, IT teams are able to view their alert clusters across services or tech stacks in a graphical and easy to understand interface that is ideal for pattern matching. With all the data in one place, organizations can finally gain an understanding of their ever expanding infrastructure and begin to measure performance across their critical apps and services. Learn more about the Infrastructure Health Application.

Service Health

Service Health displays an up-to-the-minute status of your PagerDuty service's health status. It also gives you the ability to Focus on any of those services that may be affecting multiple other services and causing Major Incidents to occur. This is a foundational application for the Console and one that anchors several other console configurations.

Major Incidents

Incidents designated as high urgency will appear in the Major Incidents application. You can Focus on specific incidents to see their relationship with your services, responders, and overall infrastructure.

Responders

Responders allows you to quickly identify which team and which escalation policy is covering, online and working to remediate open incidents.

Console Features

Focus

Focusing on any of the entries in a given application highlights related content in the other applications, providing a deep context around the item in focus. To focus on an incident, hover over the incident and click on the Focus option that appears for that incident.

Details View

Select the details feature for more information on specific incident or services. To view details for an incident or service, hover over the item and click on the Details option that appears for that incident.

Search

The Service Health and Major Incidents applications let you quickly filter out information of interest. Simply click the search icon (magnifying glass) to get started.

Customization

You can adjust the layout of the console, as well as the applications you want to appear. Application layout can be changed by clicking the Console Settings (gear) button in the upper right hand corner of the page. A single application can also be expanded to encompass the entire screen by clicking the maximize button in the upper right hand corner of any application. Display customization is per-user, so go ahead and set things ups how you like without fear that your changes will impact anybody else on your team.

Alerts Table

Alerts table accesibility:

The alerts table is accessible via the Alerts tab in the navigation bar. This tab is only visible on accounts with alerts and incidents enabled.

The alerts table, available in our web UI, allows you to view alerts according to your preferences, helping you locate what you're looking for quickly and efficiently.

Alerts Table Columns

Alerts are filtered via the columns that appear in the alerts table. The type of filter (radio button or search) varies based on column data type.

The columns Severity, Summary, Source, Class, Component, and Group all map to PD-CEF fields, which can be easily included in your events with our new Events API v2. PD-CEF is currently available for Splunk, AWS CloudWatch, DataDog, Nagios, Sensu and Zabbix integrations.

The Status column indicates whether your alerts are triggered and/or resolved, suppressed and/or actionable.

Related Incident points to the incident your alerts are grouped under, and Service and Integration point to the service and integration your alerts are associated with.

Show and Hide columns

Using the Customize Columns button, users may show or hide the Related Incident, Service, Integration, Source, Class, Component and Group columns. Status, Severity, Summary and Created columns are automatically enabled, and cannot be removed from your view, which is why they appear in gray under the Customize Columns button.

Related Incident, Service, Integration and Source are automatically enabled, but can be hidden.

Interacting with your columns

Search filtering

Search filtering is enabled on the Summary, Related Incident, Integration, Source, Class, Component, and Group columns. Partial matches will be displayed, for example searching Prod will display prod04 and Prod03 in the results.

Sort filtering

A majority of the columns in the table have sort capability. The default sort setting is by Status (Triggered, Triggered Suppressed, Resolved, Resolved Suppressed). Sortable columns will have a pair of arrows to the right of the column name. Click on the arrows to sort your alerts based on this column. Active sort filters will appear as a darker grey than the columns surrounding them.

Active Filters Bar

With the active filters bar you will never lose track of which filters are currently on. Column tags populate the bar, and filter icon highlight blue when filters are active. This bar appears at the top of your table.

Clearing Table Filters

Table filters can be cleared from within the Active Filters Bar by clicking the x next to each individual filter, or by clicking Clear all table filters. In the event that no results are found when a search filter is input, this may be removed by clicking the search icon a second time and clicking Clear filter.

Reduce the Number of Notifications for the Same Incident/Event

Notification fatigue can be assuaged with the following actions:

Include an Incident Key in your API Call

If your service is set up as a Generic API service type and multiple incidents are triggering for the same issue your team will be notified for each duplicate incident. To group these incidents you will want to include incident_key in your parameters for triggering incidents.

PagerDuty de-duplicates incidents based on the incident_key parameter — this identifies the incident to which a trigger event should be applied. If there are no open (unresolved) incidents with this key, a new incident will be created.

If there is already an open incident with a matching key, this event will be appended to that incident's alert log as an additional Trigger log entry.

If the event key field isn't provided, PagerDuty will automatically open a new incident with a unique key.

Adjust your Email Management Settings

If you have an email integration you can change your incident creation settings so that incidents are only triggered under certain conditions.

There are four types of email management settings:

  • Open a new alert/incident for each trigger email: Each email sent to the service's email address will open a new incident.
  • Open a new alert/incident for each new trigger email subject: Incidents are de-duped based on the subject of the trigger emails. For example, if two emails with the same subject are sent to this service's email address, the first opens a new incident, and the second is appended to this incident.
  • Open a new alert/incident only if an open incident does not already exist: An email sent to the service's email address will only open a new incident if they service has no open incidents; otherwise, the email will be appended to the open incident.
  • Create and resolve alerts/incidents based on custom rules: Use regular expressions to parse incident triggers and resolves.

The last three incident creation settings are ones that allow alert/incident de-duplication and reduce the number of notifications being sent.

To change the incident creation settings for your email integration service:

  1. Go to Configuration → Services
  2. Click on the name of the service that houses the integration, then select the Integrations tab. You can edit the integration by selecting Edit from the settings cog, or from the integration details page accessible by clicking the name of the integration.
  1. Select the appropriate email management setting from the four options provided.
  2. Click Save changes.

Suppression and Event Rules

The Event Rules feature is available on our current Standard and Enterprise plans. Please contact our Sales Team if you would like to upgrade to a plan featuring Event Rules.

Event Rules define automated actions to take on alerts created by services, based on conditions that apply to information in the inbound events' payloads. Event Rules can perform such actions as setting the severity of the resulting alert, or automatically suppressing the alert altogether.

Event Rules can be used on services that use vendor-specific and API-based integrations. For email integrations, we recommend utilizing email management rules.

Configure Event Rules For a Service

To use event rules on a service, that service must first be set to create Alerts and Incidents for inbound events. To do this, go to Configuration → Services, edit the service and locate the Incident Behavior section at the bottom of the Edit Service page.

Event Rules then can be added from the individual service page; the Event Rules tab is located on the far right.

Event Rule Behavior

Rules are evaluated in the order they appear in the Event Rules configuration tab, top to bottom. The first matching rule is applied, and then execution stops. If no rule matches, the event is processed ordinarily, as if there were no rules configured.

The order of rules can be controlled by dragging and dropping the rules in the Event Rules page.

If an event rule is deleted while a suppressed alert is in a "triggered" state, further alerts with the same alert key will continue to deduplicate until the alert is resolved.

Setting Conditions for Event Rules

When creating a rule, first set the conditions under which the rule will apply. For example:

In the above example, the rule will apply to an inbound event if the value of its «Source» field is exactly prod05-nginx. This particular rule will cause the event to be suppressed, and the severity will be set to Error.

Event rule conditions can be configured for any of the following PagerDuty Common Event Format (PD-CEF) fields:

  • Class
  • Component
  • Group
  • Location
  • Severity
  • Source
  • Summary
  • Custom Details

Accessing Custom Details

In order to access nested components of the custom details property for a comparison in a condition, enter the full namespace path, in dot notation, from the root of the details property, into the Key name in details field. For example, if the details property is {"foo": {"bar1": {"baz": "thevaluehere"}, "bar2": 2}}, the value in the comparison for foo.bar1.baz would be thevaluehere.

Once you have selected a field to examine, select a condition, such as equals or contains, and (if applicable) enter a value to compare the field against.

In addition to basic comparison, regular expressions can be used. Note, PagerDuty uses the RE2 flavor of RegEx to evaluate expressions.

Suppression

Suppression, as opposed to setting the severity of alerts, allows you to send events to PagerDuty without triggering any notifications. Suppressed alerts are stored in PagerDuty and available for forensics, analysis, and context, but do not create incidents. Suppressed alerts can be viewed in the alerts list as well as the Infrastructure Health Application.

To suppress an alert if it matches a given set of conditions, select «Suppress» as the action to take when you are building your Event Rules.

Viewing Suppressed Alerts

Suppressed alerts are filtered out of the incidents dashboard by default, including the incidents page for the service on which the suppressed alerts were triggered. Moreover, because suppressed events do not trigger incidents, they will not be visible in the mobile app.

They can be viewed in the Alerts tab at the top of your screen, between Incidents and Configuration.

Here is an example of a suppressed event. It looks very similar to other alerts, but has "Suppressed" in the severity field and is not assigned/assignable.

The incident below was triggered on a service with alerts set up to trigger as suppressed by default.

Infrastructure Health Application

Suppressed alerts may also be viewed in the Infrastructure Health Application. Suppressed alerts will be shown in gray to differentiate them from the rest of your alerts and incidents.

Event Management