Zabbix Troubleshooting Guide
Below are some common issues that you may run into when integrating Zabbix with PagerDuty, and steps for troubleshooting:
General Configuration Troubleshooting
Review Zabbix Logs
If you’re having an issue receiving Zabbix events in PagerDuty, check your Zabbix logs to see if the PagerDuty action was called, and if there were any associated errors.
For Zabbix 2.x: Navigate to Monitoring Events, then click the event timestamp for the problem event. Check the Message actions section in the event details.
For Zabbix 3.x: Navigate to Reports Action Log to view the status of the Zabbix event.
PagerDuty User/Action Configuration in Zabbix
If events are not being sent to the PagerDuty user/action in Zabbix, please check the following:
- Make sure the PagerDuty group in Zabbix has read permissions for the hosts and/or host groups in question. Also confirm that the PagerDuty user is in the PagerDuty group.
- Check that the PagerDuty media type is enabled under Configuration Media types.
Zabbix Events not Received in PagerDuty
Sometimes events look like they’re being sent to the PagerDuty user/action, but aren't showing up in PagerDuty. If this happens, please check the following:
- Both the
zabbix-agent
andzabbix-server
services must be running for Zabbix to send notifications to PagerDuty. To check that these are running use the following commands:
$ service zabbix-agent status
$ service zabbix-server status
- Make sure your Zabbix server can make outbound HTTP/HTTPS connections to
events.pagerduty.com
on ports 80 and 443. If your environment requires a proxy to be set for outbound HTTP/HTTPS connections, make sure you set the proxy in the agent configuration or see the information below for using a proxy with the old Python integration. - Check that the integration key is set correctly in your PagerDuty user's media settings.
- If you see an error such as "pagerduty_python: PagerDuty server REJECTED the event in file...Event object is invalid," check the trigger and recovery subjects and message formats in your PagerDuty action.
Under Configuration Actions, select the PagerDuty notification action to view its details. Remove any extra characters, such as whitespace or newline characters, after the text so that the configuration exactly matches the following:
Default subject: trigger
Recovery subject: resolve
Default message and Recovery message:
name:{TRIGGER.NAME}
id:{TRIGGER.ID}
status:{TRIGGER.STATUS}
hostname:{HOSTNAME}
ip:{IPADDRESS}
value:{TRIGGER.VALUE}
event_id:{EVENT.ID}
severity:{TRIGGER.SEVERITY}
- If you're using Zabbix 3.x, make sure you have specified these script parameters for the PagerDuty media type under Administration Media types:
{ALERT.SENDTO}
{ALERT.SUBJECT}
{ALERT.MESSAGE}
PagerDuty Incidents not Resolving after Recovery in Zabbix
Please check the following items to ensure that Zabbix can deliver recovery events to PagerDuty and resolve associated incidents:
- Ensure that your PagerDuty Notifications action has messaging operations (send to user/group) defined for its Recovery operations.
- Make sure that the PagerDuty Notifications action’s messaging operations use the same message template for both the main action and the recovery action.
- Make sure the message template is the one given in the integration guide for your Zabbix integration: Zabbix 4.x-6.x Integration Guide, Zabbix 3.x Integration Guide, Zabbix 1.x Integration Guide.
- Ensure that the recovery operation’s default message subject is
resolve
(case sensitive).
Agent-Based Integration
Verify the Agent is Installed
When an issue with the agent-based installation arises, it is commonly related to the agent installation (i.e., trying to install the agent on an incompatible distribution, such as CentOS 5). The first step in troubleshooting agent-based integrations is to make sure that the PagerDuty Agent is both compatible with your distribution and successfully installed.
CentOS 5 users: Please use the Python-based integration, as the PagerDuty Agent requires a newer version of Python than the version available with CentOS 5.
Verify the Agent is Running
Once you've verified the agent is successfully installed, you'll want to make sure that it is running. You can check the status by running service pdagent status
in the command line. If the agent isn't running, you can start it with the command service pdagent start
.
Check the Agent's Logs for Errors
The agent logs activity and errors to /var/log/pdagent/pdagentd.log
, which may contain helpful troubleshooting information.
Trigger a Test Incident with the Agent's CLI
Try manually triggering an incident using the pd-send
command and check for errors (replace PD_SERVICE_KEY
with one of your own PagerDuty integration keys):
$ export PD_SERVICE_KEY=YOUR_INTEGRATION_KEY_HERE
$ pd-send -k $PD_SERVICE_KEY -t trigger -d "Server is on fire" -i server.fire
If the pd-send
command triggers an incident in PagerDuty, check the tips in the General Configuration Troubleshooting section. You may need to verify the trigger subject and message in your Zabbix configuration.
Trigger a Test Incident with pd-zabbix
Try manually triggering an incident using the pd-zabbix
command and check for errors (replace PD_SERVICE_KEY
with one of your own PagerDuty integration keys):
$ /usr/share/pdagent-integrations/bin/pd-zabbix PD_SERVICE_KEY trigger "name:Test
id:1
status:onfire
hostname:localhost
ip:127.0.0.1
value:5
event_id:2
severity:1"
Python-Based Integration
Python Version
The integration requires Python 2.7.9 or later to make a secure connection to PagerDuty. This is due to a security vulnerability in SSLv3 (POODLE), which older versions of Python use. Python 2.7.9 uses a backported version of Python 3's SSL library, so versions 2.7.9 and newer (up to 3.x) are able to make a secure connection to PagerDuty. The script does not work with Python 3.x due to other language changes in this version of Python.
Outbound HTTP/HTTPS Connections with a Proxy
If you need to set a proxy, use this modified version of the Python script for proxy support. Replace SOME_PROXY
on line 68 with your proxy address (i.e., http://proxy.company.com:3128
).
Trigger a Test Incident with the pagerduty.py Script
Try manually triggering an incident with pagerduty.py
via the command line and check for errors (set PD_SERVICE_KEY
to your own PagerDuty integration key):
$ /etc/zabbix/alert.d/pagerduty.py PD_SERVICE_KEY trigger "name:Test
id:1
status:onfire
hostname:localhost
ip:127.0.0.1
value:5
event_id:2
severity:1"
Verify the Python Script is in the Correct Location
The script should be placed in your AlertScriptsPath. This is usually /usr/lib/zabbix/alertscripts
or /etc/zabbix/agent.d
, but could be different if you installed Zabbix from non-standard packages. You can find the correct path for your particular environment by checking zabbix_server.conf
in your Zabbix server configuration directory.
Verify the Zabbix User has Write Permissions
The script queues events in /tmp/pagerduty
. If the Zabbix user cannot write to this directory, it will not be able to send alerts to PagerDuty.
Updated 7 months ago