This guide addresses common issues related to the Nagios integration. Depending on your integration type, you may run into errors specific to your environment:

General Configuration

If Nagios notifications are not triggering PagerDuty incidents as expected, the following items apply to all integration types.

Your Nagios host or service may not be reaching a HARD down state

Events are only sent to PagerDuty when your service or host changes state to HARD. Typically, a host or service will first enter a SOFT state, and only transition to HARD after it reaches its max_check_attempts limit.

For more information, please see Nagios’ State Types documentation.

To verify whether this is happening:

Check your logs.
- Debian/Ubuntu: /var/log/syslog
- RHEL/CentOS: /var/log/messages
Run grep pagerduty <log path> to see notifications sent to PagerDuty.

This is an example of a SOFT down, which would not trigger an incident in PagerDuty:

Nov 13 22:34:30 ip-10-182-165-131 nagios3: SERVICE ALERT: localhost;Current Users;WARNING;SOFT;1;USERS WARNING - 2 users currently logged in

This is an example of a HARD down, which should trigger incidents in PagerDuty:

Nov 13 22:34:30 ip-10-182-165-131 nagios3: SERVICE NOTIFICATION: pagerduty;localhost;Current Users;WARNING;notify-service-by-pagerduty;USERS WARNING - 3 users currently logged in

Confirm that your PagerDuty contact is configured properly

The pagerduty contact might not have been configured to receive notifications properly.

To check this, run grep NOTIFICATION <log path>.

If, as in the example below, "pagerduty" is not listed in your logs, check to make sure that the pagerduty contact is included in the contact group, which is configured to receive notifications under the service or host template:

Nov 13 22:34:30 ip-10-182-165-131 nagios3: SERVICE NOTIFICATION: root;localhost;Current Users;CRITICAL;notify-service-by-email;USERS CRITICAL - 5 users currently logged in

📘
Nagios XI vs. Nagios Core file paths
If you use Nagios XI, paths will differ from Nagios Core. Additionally, configuration is managed primarily through the Nagios XI web interface, as opposed to Nagios Core’s configuration files. Please refer to the Nagios XI Integration Guide for further details.

If you use the default configuration, open the file that contains the pagerduty contact to confirm it is included in the correct contact group:

Debian, Ubuntu, and other Debian-derived systems: /etc/nagios3/conf.d/contacts_nagios2.cfg
RHEL, Fedora, CentOS, and other Redhat-derived systems: /etc/nagios/objects/contacts.cfg

If you use the default configuration, open the following file to make sure that the pagerduty contact is defined properly.

Debian, Ubuntu, and other Debian-derived systems: /etc/nagios3/conf.d/pagerduty_nagios.cfg
RHEL, Fedora, CentOS, and other Redhat-derived systems: /etc/nagios/objects/pagerduty_nagios.cfg

If you use the default configuration, open the following file to confirm that the host or service template being used is contacting the correct group.

Debian, Ubuntu, and other Debian-derived systems: /etc/nagios3/conf.d/generic-service_nagios.cfg, /etc/nagios3/conf.d/generic-host_nagios2.cfg
RHEL, Fedora, CentOS, and other Redhat-derived systems: /etc/nagios/objects/generic-service_nagios2.cfg, /ect/nagios/objects/generic-host_nagios2.cfg

If you make any changes to the templates above, make sure to restart Nagios:

/etc/init.d/nagios3 restart

service nagios3 restart

[ERROR] NOTIFICATIONTYPE field must be present

The PagerDuty integration accepts the following Nagios notifications:

PROBLEM
ACKNOWLEDGE
RECOVERY

Other event types (e.g., FLAPPINGSTART and FLAPPINGSTOP) are not supported, and will result in a NOTIFICATIONTYPE error.

Please also note that sending a custom notification manually through the Nagios UI will not trigger an incident, as the integration does not support custom notifications.

If you are using the agentless integration and would like to receive FLAPPINGSTART and FLAPPINGSTOP events, you can update the enqueue_event subroutine in the pagerduty_nagios.pl script (below line 235):

if ($event{"NOTIFICATIONTYPE"} eq "FLAPPINGSTART") {
    $event{"NOTIFICATIONTYPE"} = "PROBLEM";
   }
if ($event{"NOTIFICATIONTYPE"} eq "FLAPPINGSTOP") {
    $event{"NOTIFICATIONTYPE"} = "RECOVERY";
   }

Make sure that you have enabled flapping notifications in your pagerduty_nagios.cfg file under the service_notification_options and/or host_notification_options fields.

Perl-Based Integration

📘
Tip
Use the Perl integration if you use CentOS 5 or lower.

Trigger a test incident to make sure that the Perl script will run

Manually trigger a Nagios incident with the Perl script to make sure it runs.
Make sure that you are logged in as the Nagios user, or add sudo -u nagios to your command.
- If you are logged in as the user that runs Nagios (typically the "nagios" user), you can omit this from your commands.

[ERROR] Nagios event in file /tmp/pagerduty_nagios/pd_12334543223_1235.txt DEFERRED due to network/server problems.

If your server is behind a proxy, you will need to specify it when executing the Perl script. Add the following switch to the Nagios command that calls the script, as well as your cron job:

--proxy https://my.proxy.com:<port>

Also, verify that the Perl libraries for SSL are installed (typically step 1 of the integration guide).

For Debian-based systems (i.e., Ubuntu):

aptitude install libwww-perl libcrypt-ssleay-perl

For RHEL-based systems (i.e., CentOS, Fedora):

yum install perl-libwww-perl perl-Crypt-SSLeay

Then run the following:

sudo -u nagios <path to perl script> flush --verbose

If you get a 500 response of Can't verify SSL peers without knowing which Certificate Authorities to trust, install the Mozilla::CA module by running the following command:

cpanm Mozilla::CA

[ERROR] May 16 07:12:46 sw-cloud pagerduty_nagios[32356]: open /tmp/pagerduty_nagios/pd_1337123566_32999.txt for write failed: Illegal seek

This error means that the user running Nagios does not have write permissions to the /tmp/pagerduty_nagios/ directory. The easiest solution to fix this is to delete the directory. Note, this will remove any queued alerts:

rm -rf /tmp/pagerduty_nagios

[ERROR] File was rejected because could not find CONTACTPAGER

If you see this error, you will need to enable environment variables by setting the following enable_environment_macros=1 in your nagios.cfg file:

Debian, Ubuntu, and other Debian-derived systems: /etc/nagios3/nagios.cfg
RHEL, Fedora, CentOS, and other Redhat-derived systems: /etc/nagios/nagios.cfg

Agent-Based Integration

Below are some issues that may arise with an agent-based integration while using the PagerDuty Agent.

Trigger a test incident to make sure that the agent works

Manually trigger a Nagios incident with the pd-send command to make sure the agent is working.

Replace YOUR-INTEGRATION-KEY-HERE with your actual integration key in the commands below:

sudo -u nagios /usr/share/pdagent-integrations/bin/pd-nagios -n service -k YOUR-INTEGRATION-KEY-HERE -t "PROBLEM" -f SERVICEDESC="test_description" -f SERVICESTATE="CRITICAL" -f HOSTNAME="test_host_name" -f SERVICEOUTPUT="test_service_output"

Alternatively, you can use the pd-send command to trigger an incident.

Here is an example event to trigger an incident using pd-send:

~$ export PD_INTEGRATION_KEY=YOUR-INTEGRATION-KEY-HERE
~$ pd-send -k $YOUR-INTEGRATION-KEY-HERE -t trigger -d "Server is on fire" -i server.fire
Event processed. Incident Key: server.fire

[ERROR] Error Performing CheckSum

This is an installation error on CentOS 5 and below. The agent supports CentOS 6 and higher. If you are running CentOS 5 or below, please use the Agentless Nagios Integration Guide.

Agent is not running

Check to make sure that the PD agent is running. To do this, run the following command:

service pdagent status

If the status is "not running", then start the PD agent with the following command:

service pdagent start

Outdated agent version

If you see something similar to the following in your logs, then you will need to update to the latest version of the agent:

09:36 | [1417765072] wproc: stderr line 01: Traceback (most recent call last): 
[1417765072] wproc: stderr line 02: File "/usr/share/pdagent-integrations/bin/pd-nagios", line 188, in <module> 
[1417765072] wproc: stderr line 03: main() 
[1417765072] wproc: stderr line 04: File "/usr/share/pdagent-integrations/bin/pd-nagios", line 117, in main 
[1417765072] wproc: stderr line 05: details = parse_fields(args.fields) 
[1417765072] wproc: stderr line 06: File "/usr/share/pdagent-integrations/bin/pd-nagios", line 177, in parse_fields 
[1417765072] wproc: stderr line 07: return dict(f.split("=", 2) for f in fields)

Bi-Directional Integration

The bidirectional integration utilizes a CGI script to capture webhooks and process them into commands that Nagios runs, to add the acknowledgment note.

You may wish to capture an incident acknowledgment webhook for iterative testing and log-checking, for example, by sending it manually via curl or Postman. You can do this by creating a webhook on your PagerDuty service and pointing it to a temporary pipedream.com URL to capture the JSON body, and acknowledging an incident that was raised from Nagios.

Once you have the JSON content of the webhook, you will be able to send the same webhook after each change and troubleshooting step attempted, without having to repeat the full process of raising an alert in Nagios and acknowledging it in PagerDuty. This allows more rapid testing and diagnosis of the CGI script that processes webhooks from PagerDuty.

CGI script cannot execute

Once you have put the script in place, try opening it in a HTTP client, for example, Perl or a web browser, with a GET request. You should receive a 400 error along with the message: 400 Requests must be POSTs

Response is 403 Forbidden

The pagerduty.cgi script must be readable and executable by the web server process. If the process cannot read and execute the script file, it will in most cases respond to the request with a 403 status.

Response is 401 Unauthorized

The script, or the directory it is in, may require authorization (e.g., HTTP Basic Auth). Check with your system administrator to see if this is the case. If HTTP Basic Auth is used, retry your GET request with username:password@ prepended to the host name in the URL (i.e., immediately following https://).

Response is 500 Internal Server Error

This indicates that the script itself is exiting prematurely with a non-success status due to an uncaught exception. The following dependencies (i.e., Perl modules) must be installed for the script to run properly:

JSON
LWP::UserAgent

The Nagios Integration Guide outlines how to install these modules using native package management in CentOS and Ubuntu.

If you have verified that dependencies are met and you still receive a 500 status response, try running the script from the command line to see what error results in the output. There may be an issue with the Perl installation on the local machine, or a syntax error in the script caused by an accidental modification that resulted in invalid Perl syntax (e.g., missing a semicolon at the end of a line).

CGI script executes, but no notes are added to the Nagios alert

The CGI script writes to an "external commands" file that Nagios reads. This is the step when the PagerDuty incident acknowledgment is translated from a webhook into an action that Nagios takes (i.e., adding a note to the alert that it has been acknowledged in PagerDuty).

There are a few issues that could prevent this process from happening properly:

Permissions on the command file/directory where it resides.
The command file's path is in a different location than what is configured in the default Nagios installation.
Nagios might not be configured to execute external commands.

In the Nagios configuration specification (per the documentation on configuring external commands), the two directives check_external_commands and command_file are particularly helpful when troubleshooting the item above. The latter determines the path where the command file resides.

If you can verify that external commands are enabled in Nagios, per the check_external_commands option, and can obtain the path from the command_file option, then you can then check that against the path that is hard-coded in the CGI script, on line 14:

'command_file' => '/var/lib/nagios3/rw/nagios.cmd', # External commands file

File Permissions

There could also be an issue related to the command file's permissions, in which case you will need to check to see what user ID is running the script, and ensure it has write permission to the command file.

Nagios Troubleshooting Guide

General Configuration

Your Nagios host or service may not be reaching a HARD down state

Confirm that your PagerDuty contact is configured properly

📘
Nagios XI vs. Nagios Core file paths

[ERROR] NOTIFICATIONTYPE field must be present

Perl-Based Integration

📘
Tip

Trigger a test incident to make sure that the Perl script will run

[ERROR] Nagios event in file /tmp/pagerduty_nagios/pd_12334543223_1235.txt DEFERRED due to network/server problems.

[ERROR] May 16 07:12:46 sw-cloud pagerduty_nagios[32356]: open /tmp/pagerduty_nagios/pd_1337123566_32999.txt for write failed: Illegal seek

[ERROR] File was rejected because could not find CONTACTPAGER

Agent-Based Integration

Trigger a test incident to make sure that the agent works

[ERROR] Error Performing CheckSum

Agent is not running

Outdated agent version

Bi-Directional Integration

CGI script cannot execute

Response is 403 Forbidden

Response is 401 Unauthorized

Response is 500 Internal Server Error

CGI script executes, but no notes are added to the Nagios alert

File Permissions

General Configuration

Your Nagios host or service may not be reaching a HARD down state

Confirm that your PagerDuty contact is configured properly

📘Nagios XI vs. Nagios Core file paths

[ERROR] NOTIFICATIONTYPE field must be present

Perl-Based Integration

📘Tip

Trigger a test incident to make sure that the Perl script will run

[ERROR] Nagios event in file /tmp/pagerduty_nagios/pd_12334543223_1235.txt DEFERRED due to network/server problems.

[ERROR] May 16 07:12:46 sw-cloud pagerduty_nagios[32356]: open /tmp/pagerduty_nagios/pd_1337123566_32999.txt for write failed: Illegal seek

[ERROR] File was rejected because could not find CONTACTPAGER

Agent-Based Integration

Trigger a test incident to make sure that the agent works

[ERROR] Error Performing CheckSum

Agent is not running

Outdated agent version

Bi-Directional Integration

CGI script cannot execute

Response is 403 Forbidden

Response is 401 Unauthorized

Response is 500 Internal Server Error

CGI script executes, but no notes are added to the Nagios alert

File Permissions

📘
Nagios XI vs. Nagios Core file paths

📘
Tip