PagerDuty's SRE Agent transforms incident response in the Operations Console and Slack by automatically analyzing incidents, providing key context, and recommending remediation actions. It accelerates triage to reduce risk, cost, and cognitive load, and it continuously learns to prevent repeat issues.

📘
Availability
To access the SRE Agent within Operations Console, you must have AIOps and PagerDuty Advance. To access SRE Agent in Slack, you must have PagerDuty Advance.
PagerDuty Advance is available through one-time credits or as an add-on with the following pricing plans:

Enterprise

Business

Professional

Please contact our Sales Team to upgrade to a pricing plan with PagerDuty Advance or AIOps. If you do not currently have PagerDuty Advance or AIOps, we will begin a trial in order to give you access to the SRE Agent.

🚧
PagerDuty Advance Disclosure
Please read our PagerDuty Advance AI Disclosure for more information about how we designed, built and assessed PagerDuty Advance with mission critical work in mind.

Overview

The goal of the SRE Agent is to work side-by-side with users, collecting information and learning the ultimate resolution. The SRE Agent will always provide a summary and collect feedback from the user ensuring a continuously improving system. Once users triage a given incident, and provide information, that information will be available for future incidents.

Key Features

Ingest and analyze user runbooks, SOPs, and diagnostics (e.g., error logs)
Generate and save playbooks for recurring issues
Prioritize actions by urgency and impact
Surface likely root causes from available data
Recommend diagnostic and remediation steps
Detect patterns; recall similar incidents and past resolutions
Provide structured troubleshooting for incidents, services, and infrastructure
Organize context into actionable views with interactive nudges/buttons
Summarize conversations and inputs, mark resolved, and save learnings

📘
PagerDuty Advance Credits
Accounts with PagerDuty Advance have an allotment of credits at their disposal. The SRE Agent uses four credits whenever you:

Submit a request to the SRE Agent via Slack or Operations Console (e.g., “what are my past incidents,” “what is the likely root cause”)

Click a nudge button (e.g., Update a Runbook, Analyze Past Incidents)
Please refer to How many credits does each action cost for more information.

Use SRE Agent

After configuring PagerDuty Advance, you can access the SRE Agent through either:

In the Operations Console

In the PagerDuty web app, navigate to AIOps Operations Console.
1. Optional: Add the SRE Agent column to the Operations Console as a default column for faster incident triage.
  
  SRE Agent Column in the Operations Console
Select an incident by clicking on the incident Title.
Select the SRE Agent tab and wait for the agent to load your incident summary and suggest next steps.
Begin troubleshooting the incident with the agent by asking questions and providing information.
Use the SRE Agent’s “nudges” or buttons to take prescribed action during the conversation. For example, Upload a service runbook.
Upload additional files by adding an attachment.

In Slack

📘
Before your begin
You will need to configure Slack with the SRE Agent. Please see our instructions to Configure the Slack Integration and Connect PagerDuty Advance to Slack if you have not already done so.

👍
Note
If you do not have PagerDuty AIOps, you may still access certain PagerDuty AIOps information, such as related incidents, past incidents, change events and outlier incidents, but only within the Slack chat interface. This information will not be available outside of the agent’s Slack chat interface (e.g. in the Incident Management web app).

🚧
Required Scopes
The SRE Agent requires additional scopes to work with Slack. A PagerDuty Admin may need to reauthorize the Slack integration in order to grant these scopes.

Access the SRE Agent in the following channel types:
1. Dedicated incident channels
2. Team or service based Slack channels
Start the SRE Agent by:
1. Selecting the SRE Agent Triage button
2. Asking questions in the chat. Use@pagerduty with a question related to the incident. See the list of example questions.
Upload a file to the SRE Agent in Slack:
1. Click the upload runbook or update runbook button.
2. Follow the prompts and select the runbook to upload.
3. Press Submit.

Integrations

SRE Agent can retrieve log data from observability platforms such as Grafana, Datadog, New Relic, AWS CloudWatch and runbooks from sources like Confluence and GitHub. By analyzing these logs and runbooks, SRE Agent guides responders through investigation, triage, and resolution—ultimately reducing MTTR and escalations. When setting up one of these workflow integrations, select Allow SRE Agent access to use this connection.

📘
Setting up an Integration
For more on setting up SRE Agent integrations, please view the article on Agent Tooling Configuration.

Supported Actions

The SRE Agent uses nudges to recommend supported actions such as:

Upload Runbook: For first-time setup on a service
Update Runbook: When a runbook already exists
Analyze Past Incidents: Review history and patterns
Analyze Related Incidents: Identify correlations and impact
Generate a Playbook: Create repeatable response steps
Check Change Events: Verify recent changes for possible cause
Search Logs: Check logs based on tooling setup
Update Memory: After incident resolution, save new information to SRE Agent memory

Incident Notes

The SRE Agent analyzes new notes posted during active incidents, where you can see them in:

Slack: Posts analysis of new incident notes
Operations Console: Posts each new note in chat with interpretation

You may disable proactive notes messages on the AI Settings page. Users can ask questions in the chat about recent notes.

Example Questions

The following questions below are representative of the types of questions the SRE Agent can answer. The SRE Agent leverages large-language-models where questions do not have to be provided in the exact text format as shown below.

Example Question to SRE Agent	Description
Can you analyze past incidents to see how this was resolved before?	Ability to see what similar incidents occurred previously for that service
Can you provide a list of related incidents?	Ability to see what active incidents that might be related on services that are not your service
How do I check [insert service, infrastructure information] for this specific error?	Ability to ask questions related to the type of incident you are troubleshooting
What information should I gather for this incident to help troubleshoot?	Ability to understand what type of information is needed to troubleshoot the incident
Should I do step X or step Y first to troubleshoot this incident?	From the list of suggested next steps, it recommends which step to take first, with additional context on its reasoning
How urgent is this incident based on the data?	Ability to understand incident urgency
What steps should I take to troubleshoot this issue?	Suggested remediation steps
Can you generate a playbook for resolving this error?	SRE Agent develops a playbook based on the agent’s understanding of the incident
How do I check the logs for this specific error?	Instructions for checking logs, may include some sample query
How can I prevent this error from recurring?	Suggestions on how to improve the incident for the future such as service or infrastructure improvements
Is there a pattern to when these errors occur?	Analysis of incident patterns
What's the impact of this error on our systems?	Potential impact on other related services or infrastructure based on the incident context
What are some likely root causes for this incident?	Suggested root-cause for the incident

SRE Agent Memory Definitions

The SRE Agent maintains several types of memory to provide increasingly relevant and personalized assistance over time. All SRE Agent memory artifacts scoped to a given PagerDuty service. Understanding these memory types below helps you leverage the agent's full capabilities.

SRE Incident Playbook ("Scratchpad")

Continuously learns from your organization's historical incident data.
Automatically generates prioritized resolution steps tailored to your environment by analyzing patterns from past incidents.
Provides AI-powered baseline recommendations, even for novel incidents.
Delivers context-aware troubleshooting guidance that improves with each resolved incident.

Customer Service Runbook

Stores and references your runbooks, SOPs and documentation (e.g., Confluence or GitHub pages).
Remembers manually-provided documentation for future conversations.
Recommends steps that align with your organization's established procedures and standards.

Incident Summarization

Automatically creates comprehensive summaries when incidents are marked complete.
Captures key learnings, resolution details and troubleshooting paths without manual documentation effort.
Builds an ever-growing repository of institutional knowledge that benefits future incident response.

Service Profile

Metadata the SRE Agent has observed about a given service via customer runbook, event payload data, and user interactions.
- Examples: cloud providers, region, service type, relevant log search queries for a given service, etc.

Best Practices

Use these quick tips to get the most from the SRE Agent—share context fast, collaborate in triage, and improve outcomes over time.

Provide Relevant Documentation and Context

Upload any runbooks, SOPs, or knowledge base articles related to the affected service or architecture so everyone has the right context.

Resolve Incidents

SRE Agent will recall key information observed during an incident including event payload information, key user interactions, log search queries, and other data that is useful for future similar incidents. Incidents must be resolved in order for SRE Agent to save information into the memory. If interacting with the SRE Agent post incident resolve, select the Update Memory button to prompt the SRE Agent to save additional information for future incidents.

Interact with the Agent

Treat the SRE Agent like your triage buddy and ask questions whenever you get stuck. Share any critical findings or remediation steps you’ve taken during the incident so the agent stays informed and learns over time.

Provide Performance Feedback

Report whether each suggested troubleshooting step was a success or a failure. If a step fails, tell the SRE Agent so it can suggest alternative actions and keep the triage moving forward.

Generate Incident Playbooks

At the end of each incident, request a summary playbook. Review it for accuracy and completeness, then copy the approved version to your knowledge base for future use.

Runbooks

Structure runbooks 1 per service with SOPs based on incident types or scenarios. Optionally include log queries to help the SRE Agent build better searches.

Product Limits

The SRE Agent analyzes custom_details and notes, but only the first 2k characters (2,000) of each. Anything beyond that is not included.

Rate AI Responses

Use the “Rate AI Response” option next to each suggestion or recommendation in the conversation. Provide feedback to help improve the product and future recommendations. Your input helps the system learn and deliver better assistance over time.

FAQ

What incident data can the SRE Agent analyze?

The SRE Agent analyzes:

Event and alert payload information
Historical and related incidents
Change events
User-provided data (runbooks, logs, documentation)

Current limitations: Limited access to incident timeline details, incident workflows, and alert grouping data. These features will be added in future releases to enhance SRE Agent capabilities.

What happens if a recommendation or incident summary is not correct?

Interact with the agent and let it know that the suggestion provided was not helpful, why it was not helpful, and ask for an alternative recommendation. Also, you should rate the response for the recommendation which will be analyzed to improve future recommendations.

What file types and limits exist for file upload?

We currently support .txt, pdf, and .md files.
We currently support .jpg and .png for image analysis.
One conversation can have up to a total of 25 files, with each file being 100 Kb maximum.

What ways can the SRE Agent fetch links?

The SRE Agent automatically fetches runbook URLs from the event payload. Users can also ask it to:

Fetch specific runbook links from Knowledge Base tools (including embedded links within pages)
Conduct log searches via nudge (button) or chat command

SRE Agent

📘
Availability

🚧
PagerDuty Advance Disclosure

Overview

Key Features

📘
PagerDuty Advance Credits

Use SRE Agent

In the Operations Console

In Slack

📘
Before your begin

👍
Note

🚧
Required Scopes

Integrations

📘
Setting up an Integration

Supported Actions

Incident Notes

Example Questions

SRE Agent Memory Definitions

SRE Incident Playbook ("Scratchpad")

Customer Service Runbook

Incident Summarization

Service Profile

Best Practices

Provide Relevant Documentation and Context

Resolve Incidents

Interact with the Agent

Provide Performance Feedback

Generate Incident Playbooks

Runbooks

Product Limits

Rate AI Responses

FAQ

What incident data can the SRE Agent analyze?

What happens if a recommendation or incident summary is not correct?

What file types and limits exist for file upload?

What ways can the SRE Agent fetch links?

📘Availability

🚧PagerDuty Advance Disclosure

Overview

Key Features

📘PagerDuty Advance Credits

Use SRE Agent

In the Operations Console

In Slack

📘Before your begin

👍Note

🚧Required Scopes

Integrations

📘Setting up an Integration

Supported Actions

Incident Notes

Example Questions

SRE Agent Memory Definitions

SRE Incident Playbook ("Scratchpad")

Customer Service Runbook

Incident Summarization

Service Profile

Best Practices

Provide Relevant Documentation and Context

Resolve Incidents

Interact with the Agent

Provide Performance Feedback

Generate Incident Playbooks

Runbooks

Product Limits

Rate AI Responses

FAQ

What incident data can the SRE Agent analyze?

What happens if a recommendation or incident summary is not correct?

What file types and limits exist for file upload?

What ways can the SRE Agent fetch links?

📘
Availability

🚧
PagerDuty Advance Disclosure

📘
PagerDuty Advance Credits

📘
Before your begin

👍
Note

🚧
Required Scopes

📘
Setting up an Integration