Zabbix Troubleshooting Guide

Follow

Below are some common issues that you may run into when trying to integrate Zabbix with PagerDuty and steps for troubleshooting those problems. 

Potential Issues within the Zabbix Configuration

Check whether Zabbix has attempted to send an event to PagerDuty by viewing your Zabbix logs. Make sure the PagerDuty action was called, and if there were any errors in doing so.

  • For Zabbix 2.x: navigate go to Monitoring → Events, then click the event timestamp for the problem event. Check the Message actions section in the event details.
  • For Zabbix 3.x: navigate to ReportsAction Log to view the Status of the Zabbix event.  

Events are not being sent to the PagerDuty user/action.

  • Make sure the PagerDuty group in Zabbix has read permissions for the hosts and/or host groups in question. Also confirm that the PagerDuty user is in the PagerDuty group.

  • Check that the PagerDuty media type is enabled in ConfigurationMedia types.

Events appear to be sent to the PagerDuty user/action, but aren't showing up in PagerDuty.

  • Both the zabbix-agent and zabbix-server services must be running for Zabbix to send notifications to PagerDuty. The check that these are running use the following commands:

    service zabbix-agent status
    service zabbix-server status
  • Make sure your Zabbix server can make outbound HTTP/HTTPS connections to events.pagerduty.com on ports 80 and 443. If your environment requires a proxy to be set for outbound HTTP/HTTPS connections, make sure you set the proxy in the agent configuration or see the information below for using a proxy with the old Python integration.

  • Check that the integration key is set correctly in your PagerDuty user's media settings.
  • If you see an error such as "pagerduty_python: PagerDuty server REJECTED the event in file...Event object is invalid," check the trigger and recovery subjects and message formats in your PagerDuty action.

    Under ConfigurationActions, select the PagerDuty notification action to view its details. Remove any extra characters, such as whitespace or a newline-character after the text so that the configuration matches this exactly:

    Default subject: trigger
    Recovery subject: resolve
    Default message and Recovery message:

    name:{TRIGGER.NAME}
    id:{TRIGGER.ID}
    status:{TRIGGER.STATUS}
    hostname:{HOSTNAME}
    ip:{IPADDRESS}
    value:{TRIGGER.VALUE}
    event_id:{EVENT.ID}
    severity:{TRIGGER.SEVERITY}
  • If you're using Zabbix 3.x, make sure you have specified these script parameters for the PagerDuty media type under AdministrationMedia types:

    • {ALERT.SENDTO}

    • {ALERT.SUBJECT}

    • {ALERT.MESSAGE}

    ZabbixTroubleshooting_img2.png

Incidents are being triggered in PagerDuty, but they are not being resolved automatically after recovering in Zabbix.

Make sure the PagerDuty action in Zabbix has a condition of Trigger value = PROBLEM. The Trigger value must still be set to PROBLEM even if you have a custom Trigger severity you want to use.

Troubleshooting Issues with the Agent-based Integration

  • Verify the agent is installed. The most common problem we see when we begin troubleshooting agent integrations is that the agent has not been installed successfully (i.e. trying to install on an incompatible distribution like CentOS 5). The first step in troubleshooting Agent-based integrations is to make sure that the PagerDuty Agent is both compatible with your distribution and successfully installed by following the steps in our Agent Install Guide.

    CentOS 5 users: You will need to use the older Python-based integration, as the PagerDuty Agent requires a newer version of Python than the version available with CentOS 5.

  • Verify the agent is running. Once you've verified the agent has been successfully installed, you'll want to make sure that it is running. You can check the status by running service pdagent status. If the agent isn't running, you can start it with the command service pdagent start.

  • Check the agent's logs for errors. The agent logs activity and errors to /var/log/pdagent/pdagentd.log, which may contain information helpful in troubleshooting problems with the integration.

  • Trigger a test incident with the agent's CLI. Try manually triggering an incident using the pd-send command and check for errors (set PD_SERVICE_KEY to your own PagerDuty integration key):

    $ export PD_SERVICE_KEY=65d9cd0e14c04dae8ef86867277d138c
    $ pd-send -k $PD_SERVICE_KEY -t trigger -d "Server is on fire" -i server.fire

    If an incident is triggered in PagerDuty, check the tips in the Potential Issues within your Zabbix Configuration section. You may need to verify the trigger subject and message in your Zabbix configuration.

  • Trigger a test incident with pd-zabbix. Try manually triggering an incident using the pd-zabbix command and check for errors (set PD_SERVICE_KEY to your own PagerDuty integration key):

    /usr/share/pdagent-integrations/bin/pd-zabbix PD_SERVICE_KEY trigger "name:Test
    id:1
    status:onfire
    hostname:localhost
    ip:127.0.0.1
    value:5
    event_id:2
    severity:1"

Troubleshooting Issues with the Python-based Integration

  • Make sure you have Python 2.7.9 or a newer 2.x version. Although the Python script was previously tested with Python 2.6, it now requires version 2.7.9 to make a secure connection to PagerDuty. This changed because SSLv3 is no longer enabled on our web servers due to the POODLE vulnerability, and older versions of Python try to use SSLv3. Python 2.7.9 uses a backported version of Python 3's SSL library, so versions 2.7.9 and newer (up to 3.x) are able to make a secure connection to PagerDuty. The script also does not work with Python 3.x due to other language changes in this version of Python.

  • Does your environment require a proxy to be set for outbound HTTP/HTTPS connections? If you need to set a proxy, use this modified version of the Python script for proxy support. Replace SOME_PROXY on line 68 with your proxy address (i.e. http://proxy.company.com:3128).

  • Trigger a test incident with the pagerduty.py script. Try manually triggering an incident with pagerduty.py via the command line and check for errors (set PD_SERVICE_KEY to your own PagerDuty integration key):

    /etc/zabbix/alert.d/pagerduty.py PD_SERVICE_KEY trigger "name:Test
    id:1
    status:onfire
    hostname:localhost
    ip:127.0.0.1
    value:5
    event_id:2
    severity:1"
  • Verify the Python script is in the correct location. The script should be placed in your AlertScriptsPath. This is usually /usr/lib/zabbix/alertscripts or /etc/zabbix/agent.d, but could be different if you installed Zabbix from non-standard packages. You can find the correct path for your particular environment by checking zabbix_server.conf in your Zabbix server configuration directory.

  • Verify the Zabbix user has write permissions to /tmp/pagerduty. The script queues events in /tmp/pagerduty, so if the Zabbix user can't write to this directory it won't be able to send alerts to PagerDuty.

Have more questions? Submit a request

Comments

  • Avatar
    Richard Geniesse

    Edit: Realized the python comment was for the pagerduty.py script.

    Hello,

    Thanks for this article. Lots of good stuff in here.

    I am running Zabbix 3.0.1 on CentOS 7. Python 2.7.5. Confirmed that Administrator > Media Type > PagerDuty parameters are set, all 3 of them. Ensured no white space/extra characters are in the action conditions. They are just as you have here. Made sure my symlink was set properly for Zabbix 3. Also able to use PageryDuty's CLI to send a page.

    I've been trying to track down why Zabbix won't page PagerDuty from event detection to my phone. /var/log/pdagent/pdagentd.log is free of anything useful, just all my successful command line tests. zabbix-agent and server are running. Zabbix shows the event and says it successfully sent them to PagerDuty.

    Where I am at:

    I can run your "Trigger a test incident with the pagerduty.py script" example and get a page to PagerDuty.

    I then added a section to the python script pd-zabbix to print out the arguments it gets, which are (sanitized version):

    'pd-zabbix', 'service_key', 'trigger', 'name:Test\nid:1\nstatus:zabbixTest2\nhostname:localhost\nip:127.0.0.1\nvalue:5\nevent_id:2\nseverity:1']

    When I trigger an event in Zabbix, the pd-zabbix gets ran and the arguments turn to this (keeping the same data to simplify things):
    'pd-zabbix', 'service_key', 'trigger', 'name:Test\r\nid:1\r\nstatus:zabbixTest2\r\nhostname:localhost\r\nip:127.0.0.1\nvalue:5\r\nevent_id:2\r\nseverity:1']

    So "\n" vs "\r\n". Could that be impacting PagerDuty's ability to parse the arguments correctly? Or are the "\r" mimicking the returns seen in the command line test of pd-zabbix?

    I've been slowly learning python, but it is basic at best. Any other points you could make would be helpful. Thanks!

    Richard.

  • Avatar
    Lisa Thompson

    Hi Richard!

    Using CentOS, it is possible that you have SELinux enabled, which can prevent Zabbix from being able to execute pd-zabbix if it thinks the behavior could be malicious/a secruity risk. Do you have SELinux enabled? If so, I would recommend disabling it. The configuration file is at `/etc/selinux/config` and you can disable it by changing `SELINUX=enforcing` to `SELINUX=disabled` and restarting the system for the change to take effect.

    I hope this helps! If you have any questions feel free to email us at support@pagerduty.com.

    Cheers,
    Lisa

  • Avatar
    Richard Geniesse

    Hey Lisa,

    That fixed it! Thank you so much! The server is still not in production, so I had thought I turned it off long ago and it never crossed my mind. Everything I'd expect to work is now functioning, including resolve notices getting back to PagerDuty from Zabbix.

    Thanks again, and thank you Jonathan for making this guide.

    Richard.