A service may represent an application, component, or team you wish to open incidents against. Services contain integrations, as well as determine the routing and incident settings for events triggered by any integrations used on that service.
We recommend keeping the following items in mind when setting up your services:
Reduce confusion with services and integrations by agreeing on naming conventions
Especially if multiple teams are using PagerDuty with the same integrations, it wouldn't really be helpful for everyone to try to name their service "Datadog".
Get together with your team and come up with naming conventions that each team can use to differentiate their service with another team's service.
Some things you might want to include in your service names:
- Team names
- Business Unit
- Production environments
- Priority level
- Integration/Monitoring tool name
- Customer name
Including additional names and keywords to your services will also help narrow down search results for services by a particular name.
Add descriptions to your services
Make sure that everybody knows what kinds of incidents are supposed to trigger on a service by adding a description to your service.
To add a description, edit a service, under the Service Name there is a description box where you can add a description.
Review your timeout settings
There are two timeout settings on a service:
- Incident ack timeout: determines when users should be re-notified if an incident has been acknowledged for too long. The default is 30 minutes.
- Auto-resolve timeout: determines when a PagerDuty incident should automatically resolve itself. The default is 4 hours.
You can change the threshold of these settings or disable them completely based on your use case.
For example, if incidents generally take longer than 4 hours to resolve, you may want to increase this threshold or disable it altogether.
If incidents take longer then 30 minutes to investigate and resolve, it may not help to have a 30 minute incident ack timeout period (as it would mean bother your on-call tech every 30 minutes every time they ack an incident while investigating it).
Cut the incident noise with email filters and email managements rules
Email filters determine what emails should trigger an incident.
Email management rules determine how emails should trigger and also resolve incidents.
Setting up these rules will allow you to control what incidents are triggered, making sure that only the important, actionable ones notify your on-call team, and will also allow you to more accurately report on incident resolution times.
Connect your services to your chat and collaboration tools
Webhooks send out HTTP callbacks when interesting events happen to incidents within your PagerDuty services.
Integrating your services with your chat and collaboration tools via webhooks is a great way to create transparency around your incidents.
Add multiple integrations to a service
You can add more than one integration to a PagerDuty service, and may want to do so if several integrations are monitoring the same piece of infrastructure. Learn how to add multiple integrations to a service here.
For additional tips on how to use multiple integrations to best represent your internal systems, please check out our best practices article here.