I recently found out nagios checker for firefox , its really good tool to get a snapshot of the nagios monitoring , it displays 4 tabs in toolbar hosts down, unknown services , serivice warnings and critical services , its awesome and gives snapshot of the issue on ur browser.you can download it from https://addons.mozilla.org/en-US/firefox/addon/3607/
Archive for the ‘Nagios’ category
Nagios checker for Firefox
November 11th, 2010check_mk with Nagios
October 21st, 2010check_mk – a new general purpose Nagios-plugin for retrieving data.
Check_mk adopts a new a approach for collecting data from operating systems and network components. It obsoletes NRPE, check_by_ssh, NSClient and check_snmp. It has many benefits, the most important of which are:
* Significant reduction of CPU usage on the Nagios host.
* Automatic inventory of items to be checked on hosts.
The larger your Nagios installation is, the more important get these points. In fact check_mk enables you to implement a monitoring environment exceeding 20.000 checks/min on the first hand.
Availability and Support
Check_mk is free software. You can use, modify and redistribute it under the terms of the GNU GPL Version 2. Professional support for check_mk is available from us. Please contact us, if you need improvements or general help with setting up Nagios and check_mk.
Basic principle
The following figure illustrates how check_mk works. Data is retrieved in four steps:
1. For each host Nagios triggers one active check per check interval. This active check calls check_mk as plugin.
2. check_mk connects to the target host via TCP. An the host the check_mk_agent retrieves all relevant data about that host at once and sends it back as ASCII text.
3. check_mk extracts performanca data and directly insert that into round robin databases.
4. check_mk extracts relevant data, compares it against warning/critical levels and submits all check results of this host via Nagios’ passive service checks.
MySQL Replication Monitoring
July 8th, 2010MySQL Replication is widely and we need to have strong replication monitoring for these things
1) To monitor if the replication is working.
2) To check the latency between the master and slave .
3) TO check the consistency between the master and slave , as sometime due to manual or master server crash , master and slave may go out of sync.
lets see how we can set up monitoring for each of the scenarios.
MySQL replication status on the can be checked via
mysql> show slave status \G;
output would be like
Slave_IO_State: Waiting for master to send event
Master_Host: master
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000542
Read_Master_Log_Pos: 231260599
Relay_Log_File: relaylog.000496
Relay_Log_Pos: 231260744
Relay_Master_Log_File: mysql-bin.000542
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 231260599
Relay_Log_Space: 231260935
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2013
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
more details on this can be found onhttp://dev.mysql.com/doc/refman/5.1/en/show-slave-status.html
Case 1) MySQL Replication is working
As we know that MySQL Replication uses two threads Slave_IO and Slave_SQL , slave_IO is responsible for reading the master binary log and writing events from master to the relay log on the slave and slave_SQL is responsible for executing the events from the relay log on the slave .so for MySQL Replication to work both the threads must be running , to check that from the show slave status \G the following values must be Slave_IO_Running: Yes and Slave_SQL_Running: Yes
we can check the variables using the shell or script or Nagios to monitor.
Case 2) To check the latency between the master and slave
MySQL Replication is real time most of the time , but because of many reasons( Slave heavily loaded , IO issue on slave) slave may not be able to catch up with the master.
I have seen many people using the “Seconds_Behind_Master” from the output of the show slave status , it works most of the time but not always .That value is comparing the slave’s current time with the time on the master when it executed the currently replicating SQL statement. If the slave is low-volume (or not properly slaving) you can end up with misleading information or even a false sense of security.Consider another case when you have more level of replication example A is master server B is slave of A and Server C which is slave of B , if there is some issue in replication between A and B , the “Seconds_Behind_Master” on C will still be showing as 0 but in actual the replication is broken in the sense that C is not getting the latest data from A.
The correct way of monitoring the replication is using
http://www.maatkit.org/doc/mk-heartbeat.html
how to use the mk-heartbeat
on master download the script from
wget http://www.maatkit.org/get/mk-heartbeat
make it executable
on master(192.168.2.80)
create table heartbeat on Database heart
CREATE TABLE heartbeat (
id int NOT NULL PRIMARY KEY,
ts datetime NOT NULL
);
It needs to have at least one row
INSERT INTO heartbeat (id) VALUES (1);
now run the script and make it a daemon
./mk-heartbeat -D heart –table heartbeat -u heartbeat -p XXXXXXXXX –update -h 192.168.2.80
on slave(192.168.2.82) download the script from
wget http://www.maatkit.org/get/mk-heartbeat
make it executable
./mk-heartbeat -D heart –table heartbeat -u heartbeat_slave -p XXXXXXXXX –monitor -h 192.168.2.82
the output would be something like
1s [ 0.02s, 0.00s, 0.00s ]
1s [ 0.03s, 0.01s, 0.00s ]
1s [ 0.05s, 0.01s, 0.00s ]
0s [ 0.05s, 0.01s, 0.00s ]
0s [ 0.05s, 0.01s, 0.00s ]
0s [ 0.05s, 0.01s, 0.00s ]
0s [ 0.05s, 0.01s, 0.00s ]
1s [ 0.07s, 0.01s, 0.00s ]
the output will tell if mysql replication is having any lag or not.
case 3) Inconsistency between Mater and Slave
Some time because of manual error or master crash or unclean shutdown the master and slave may be out of sync, its very important to check that and take corrective action .
if not detected on time slave might go completely out of sync and might need to be set completely.
how do we check that master and slave are completely sync.
we can gr8 tool mk-table-checksum from maatkit http://www.maatkit.org/doc/mk-table-checksum.html
lets see how we can use the mk-table-checksum
download
wget http://www.maatkit.org/get/mk-table-checksum
make it executable
create table checksum on the master 192.168.2.10
on database Test
CREATE TABLE checksum (
db char(64) NOT NULL,
tbl char(64) NOT NULL,
chunk int NOT NULL,
boundaries char(100) NOT NULL,
this_crc char(40) NOT NULL,
this_cnt int NOT NULL,
master_crc char(40) NULL,
master_cnt int NULL,
ts timestamp NOT NULL,
PRIMARY KEY (db, tbl, chunk)
);
for checking we will create a test table
create table testreplication( i int , b varchar(100))
we will insert a row
insert into testreplication
select 1,’name’
–replicate=test.checksum will replicate the checksum run on master(192.168.2.10) to slave (192.168.2.12)
–tables test.testreplication will check only the table test.testreplication
./mk-table-checksum -u test -p XXXXXXX –replicate=test.checksum –tables test.testreplication 192.168.2.10
When we run it we get the output on command line
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
test testreplication 0 192.168.2.10 MyISAM 1 1e5504e6 0 NULL NULL NULL
we now log in to the slave (192.168.2.12) and on the database test we run this query
SELECT db, tbl, chunk, this_cnt-master_cnt AS cnt_diff,
this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
AS crc_diff
FROM checksum
WHERE master_cnt <> this_cnt OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc);
out put will be blank as the replication is working properly
db tbl chunk cnt_diff crc_diff
now we remove the row directly from the slave (192.168.2.12)
delete from testreplication ;
we again run the
./mk-table-checksum -u test -p XXXXXXX –replicate=test.checksum –tables test.testreplication app146
output on command line
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
test testreplication 0 192.168.2.10 MyISAM 1 1e5504e6 0 NULL NULL NULL
when we run the query to check the result
SELECT db, tbl, chunk, this_cnt-master_cnt AS cnt_diff,
this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
AS crc_diff
FROM checksum
WHERE master_cnt <> this_cnt OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc);
now we get
db tbl chunk cnt_diff crc_diff
test testreplication 0 -1 1
as now the master and slave data differs , the output is giving table name and the crc_diff , one very important thing to understand that this finds the tables which are not in sync even when the new data is getting logged into the master DB as the checksum calculating command on the master is replicated exactly on the slave and we know that the mysql replication is syncrnous so on the slave when this command to calculate the checksum is executed its considering the data as it was on master.using this we can find out all the tables which are not in sync in master and slave and take corrective action.
in next post I would be writing in detail how to find the difference on the individual tables between master and slave and how to rectify that and how to automate that process .
Thanks
Pankaj Joshi
Nagios installation using scripts
April 21st, 2010Introduction:-
Make a nagios.sh file and copy the following contents,Run on server …..simple nagios will be installed now…..
$vim nagios.sh
#sh nagios.sh
——————————————————
#!/bin/sh
# Any Failing Command Will Cause The Script To Stop
set -e
# Treat Unset Variables As Errors
set -u
echo “***** Starting Nagios Quick-Install: ” `date`
echo “***** Installing pre-requisites”
yum -y install httpd
yum -y install gcc
yum -y install glibc glibc-common
yum -y install gd gd-devel
echo “***** Setting up the environment”
useradd -m nagios
echo “INSERT_PASSWORD_HERE” |passwd –stdin nagios
groupadd nagcmd
usermod -a -G nagcmd nagios
usermod -a -G nagcmd apache
echo “***** Getting the Nagios Source and Plug-Ins”
cd /usr/local/src
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
tar xzf nagios-3.2.0.tar.gz
tar xzf nagios-plugins-1.4.14.tar.gz
echo “***** Installing Nagios”
cd /usr/local/src/nagios-3.2.0
./configure –with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode
make install-webconf
echo “***** Setting up htpasswd auth”
htpasswd -nb nagiosadmin INSERT_PASSWORD_HERE > /usr/local/nagios/etc/htpasswd.users
Service httpd restart
echo “***** Setting up Nagios Plug-Ins”
cd /usr/local/src/nagios-plugins-1.4.13
./configure –with-nagios-user=nagios –with-nagios-group=nagios
make
make install
echo “***** Fixing SELinux”
chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/
chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
echo “***** Starting Nagios”
chkconfig –add nagios
chkconfig nagios on
service nagios start
echo “***** Done: ” `date`
Thanks
Pawan Kumar
How to install NRPE agent
April 11th, 2010Introduction:-
Create a new nagios user account and give it a password.
/usr/sbin/useradd nagios
passwd nagios
cd ~/downloads
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nrpe-2.8.tar.gz
Extract the NRPE source code tarball.
tar xzf nrpe-2.8.tar.gz
cd nrpe-2.8
If openssl is not installed then install it by using this command
yum install openssl*
If xinetd is not installed please install it by using following command
yum install xinetd
Compile the NRPE addon.
./configure
make all
Install the NRPE plugin (for testing), daemon, and sample daemon config file.
make install-plugin
make install-daemon
make install-daemon-config
Install the NRPE daemon as a service under xinetd.
make install-xinetd
The permissions on the plugin directory and the plugins will need to be fixed at this point, so run the following commands.
chown nagios.nagios /usr/local/nagios
chown -R nagios.nagios /usr/local/nagios/libexec
Edit the /etc/xinetd.d/nrpe file and add the IP address of the monitoring server to the only_from directive.
only_from = 127.0.0.1 <nagios_ip_address>
Add the following entry for the NRPE daemon to the /etc/services file.
nrpe 5666/tcp # NRPE
Restart the xinetd service.
service xinetd restart
Install Nagios Plugin for 64bit
yum install perl
yum install perl-Net-SNMP
wget http://dag.wieers.com/rpm/packages/nagios-plugins/nagios-plugins-1.4.9-1.el5.rf.x86_64.rpm
rpm –ivh nagios-plugins-1.4.9-1.el5.rf.x86_64.rpm
It will install plugins under this path
/usr/lib64/nagios/plugins
Install NRPE (64 bit) from RPM
yum install xinetd
wget http://dag.wieers.com/rpm/packages/nagios-nrpe/nagios-nrpe-2.5.2-1.el5.rf.x86_64.rpm
rpm -ivh nagios-nrpe-2.5.2-1.el5.rf.x86_64.rpm
netstat -an | grep 5666
Thanks
Manoj Chauhan
Nagios Error – key verification failed
April 11th, 2010Delete host entry from server1 /var/log/nagios/.ssh/known_hosts then add new known_hosts key from server1 /root/.ssh/known_hosts
Thanks
Manoj
How imports historical Nagios log files into the mysql database
March 5th, 2010NDO Utilities:-
The NDO utilities add-on, written by Nagios developer Ethan Galstad, is designed to output events and data from Nagios to standard files or to a Unix socket. It also comes with a module called NDO2DB that allows Nagios data to be written to a MySQL or PostgreSQL database The add-on is made up of the NDOMOD Event Broker module, which is loaded by Nagios at runtime. It dumps all events and data from Nagios to a regular file or a Unix domain socket. It also contains the ndo2db daemon, which reads data that has been sent for the NDOMOD module to a Unix domain socket and dumps it into a MySQL or PostgreSQL database. You can dump into multiple databases and have multiple instances of the NDOMOD module writing to the same domain socket. There is also a utility called FILE2SOCK, which reads data from a standard file and dumps it into a Unix domain socket. Suggested uses are to dump data from
NDOMOD that has been stored in a standard file into a Unix domain socket. Or if your Nagios server is remote from your database server, you can dump data into a standard file from NDOMOD, send the file via SSH or SFTP to the database server, and then dump the data into a Unix domain socket and from there into a database. Finally, there is the LOG2NDO utility, which imports historical Nagios log files into the ndo2db daemon and sends them to a Unix domain socket or to standard output.
To install the NDO add-on, you first need to download and unpack the module from the Nagios Sourceforge site at http://sourceforge.net/project/showfiles.php?group_id=26589&package_id=173832, as you can see here:
puppy# wget http://optusnet.dl.sourceforge.net/sourceforge/nagios/ndoutils-12272005.tar.gz
manoj# tar -zxf ndoutils-1.4b9.tar.gz
manoj# cd ndoutils-1.4b9
By default NDO utilities use Nagios user, but my requirements was diffrent, i want to compile NDO utilities with manoj user. So i have used the following options during the compile time
manoj#./configure –with-ndo2db-user=manoj –with-ndo2db-group=manoj –enable-mysql –disable-pgsql –with-mysql-lib=/usr/lib/mysql
manoj# make
You must grant the user you create, in this case nagios, the SELECT, INSERT, UPDATE, and DELETE privileges to the nagios database. Replace ‘password’ with an appropriate password for the database.
manoj#create user ‘nagios@%’ identified by ‘password’;
OR
manoj#create user nagios@’localhost’ identified by ‘password’;
manoj#grant insert,delete,select,update on *.* to nagios@’%’ ;
OR
manoj#grant all on *.* to nagios@’%’ ;
manoj#grant all on *.* to nagios@’localhost’ ;
manoj#create database nagios_db
Change Mysql Root password
manoj#update user set password=PASSWORD(“new password”) where User=’root’;
manoj#flush privileges;
The NDO add-on contains a script to populate this newly created database with the required tables. For MySQL, it is called ndo-mysql.sql, and it is located in the db directory in the root of the package:
manoj# cd ndoutils-1.4b9
manoj(ndoutils-1.4b9/db)#./installdb -u user -p password -h hostname -d database
To install the NDO module itself, install the compiled ndomod.omodule file located in the src directory. I recommend copying it into the Nagios bin directory, usually /usr/local/nagios/bin:
manoj# cp src/ndomod-3x.o /usr/local/nagios/bin/ndomod.o
You also need to copy the sample configuration file for the module, ndomod.cfg. It is located in the config directory in the NDO utilities package. I recommend installing it to the Nagios etc directory, usually /usr/local/nagios/etc:
manoj# cp config/ndomod.cfg-sample /usr/local/nagios/etc/ndomod.cfg
You also need to install the ndo2db daemon and its configuration file. They are also located in the src and config directories, respectively, and I suggest you copy them to the same locations in your Nagios installation:
manoj# cp src/ndo2db-3x /usr/local/nagios/bin/ndo2db
manoj# cp config/ndo2db.cfg-sample /usr/local/nagios/etc/ndo2db.cfg
Next, you need to modify your Nagios configuration file, nagios.cfg, to load the NDO module when Nagios starts. Add the following line to your nagios.cfg configuration file, usually located in /usr/local/nagios/etc:
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
(It should be on single line entry)
This configuration directive will load the ndomod.o NEB module when Nagios is started. You will need to restart Nagios to make the module active. The config_file part of the directive must be modified to specify the location of the module configuration file. You should ensure that the ownership and permissions of all these files is appropriate. They should generally all be owned by the user and group used by the Nagios server process and the configuration files only readable by that user:
manoj# chown prod:prod /usr/local/nagios/bin/ndo2db /usr/local/nagios/bin/ndomod.o
/usr/local/nagios/etc/ndo2db.cfg /usr/local/nagios/etc/ndomod.cfg
manoj# chmod 0600 /usr/local/nagios/etc/ndo2db.cfg /usr/local/nagios/etc/ndo2db.cfg
You may also want to modify the two configuration files, ndo2db.cfg and ndomod.cfg. By default, the ndomod.o NEB module outputs data to a Unix domain socket, /usr/local/nagios/var/ndo.sock, which is created by the ndo2db daemon when it is started. You will also need to modify the ndo2db.cfg configuration file to update it with the correct database name, username, and password to allow the ndo2db daemon to write to the database.
Now you can start ndo2db by using the following command
manoj# /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
The daemon is launched with one command-line option, the location of the ndo2db daemon’s configuration file ndo2db.cfg. The daemon will create the Unix domain socket, /usr/local/nagios/var/ndo.sock. As you can see, I used the su command to change to the user nagios before launching. You should run the ndo2db daemon as the nagios user to allow the Unix domain socket to be created with the correct ownership. This will allow the ndomod.o module, which is run with the ownership and permissions of the Nagios server process, to write to that domain socket.
The module logs events and errors in the default Nagios log file, usually /usr/local/nagios/var/nagios.log OR /usr/local/nagios/var/ndo2db.debug. Check this file for errors and messages.
Thanks
Manoj Chauhan
Nagios: ndomod: Still unable to connect to data sink
March 4th, 2010I have Nagios 3.x running perfactly, I want to load all alerts data to Mysql for future reference so we can create reports accordingly.
I have installed NDOUtils but after instalation I am getting the following error in the /var/log/messages
nagios: ndomod: Still unable to connect to data sink. 14613 items lost, 5000 queued items to flush.
I have fixed the issue by changing socket_type=tcp to socket_type=unix in /usr/local/nagios/etc/ndo2db.cfg
And restart Nagios service and start ndo2db by using the following command /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
After doing above change, now if you get following logs in /var/log/medssages which means everything alright
Mar 4 04:45:44 server1 nagios: ndomod: Successfully connected to data sink. 18134 items lost, 5000 queued items to flush.
Mar 4 04:45:47 server1 nagios: ndomod: Successfully flushed 5000 queued items to data sink.
Thanks
Manoj Chauhan
Nagios Architecture
January 24th, 2010Overview
Nagios is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external “plugins” which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser.
Architecture
Nagios is built on a server/agents architecture. Usually, on a network, a Nagios server is running on a host, and plugins are running on all the remote hosts that need to be monitored. These plugins send information to the server, which displays them in a GUI.
Nagios is composed of three parts:
1) A scheduler: this is the server part of Nagios. At regular interval, the scheduler checks the plugins, and according to their results do some actions.
2) A GUI: the interface of Nagios (with the configuration, the alerts, …). It is displayed in web pages generated by CGI.It can be state buttons (green,OK/red,Error), sounds, MRTG graphs, …
3) The plugins. They are configurable by the user. They check a service and return a result to the Nagios server.
A soft alert is raised when a plugin returns a warning or an error. Then on the GUI, a green button turns to red, and a sound is emitted. When this soft alert is raised many times (the number is configurable), a hard alert is raised, and the Nagios server sends notifications: email, SMS…
Nagios functionalities
Nagios® is an open source tool specially developed to monitor host and service and designed to inform you of network incidents before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well initially developed for servers and application monitoring, it is now widely used to monitor networks availability. It is possible with the development of specific plugins around Nagios process. Nagios works with a set of “plugins” to provide local and remote service status. The monitoring daemon runs intermittent checks on hosts and services you specify using external “plugins” which return status information to Nagios. When incidents are detected, the daemon send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a Web browser.Custom “plugins” are relatively easy to develop Different methods are provided for remote resource discovery Nagios is freely available from http://www.nagios.org
Requirements
Other things you will need to get Nagios working are:
1) Nagios Plugins (from Nagios download URL)
2) GD – Graphics Libraries
3) JPEG Lib Sources
4) PNG Lib Sources
5) FPing (Fast Ping), this is optional but useful.
6) For SNMP monitoring you will need:
7) net-snmp-tools, and
8 ) net-snmp-utils
9) MySQL database for storing: Elements status logs
Plugins and Extensions
Developments on Nagios can be found at http://www.nagiosexchange.org/
Add-On projects are freely available. They cover subjects on:
1) Charts,
2) Communications,
3) Configuration,
4) Development,
5) Downtimes,
6) FrontEnds,
7) Notifications,
8 ) Misc
Plugins have been developed on:
1) Networking,
2) SNMP,
3) Hardware,
4) Linux,
5) Solaris,
6) Windows, …
1) A plugin is a small program (in Perl, C, java, python …) that checks a service (a daemon, some free space on a disk …). It must return a value and a small line of text (Nagios will only grab the first line of text). Output should be in the format: METRIC STATUS: information text performance data The allowed METRIC STATUS are 0 (OK), 1 (WARNING), 2 (CRITICAL) or 3 (UNKNOWN)
2) The warning and critical thresholds are parameters, set by the user, passed as arguments to the plugin.
3) A plugin can also return performance data in the format: “label1=value1 label2=value2 …”
These data are stored by Nagios and may be later displayed with MRTG (http://people.ee.ethz.ch/~oetiker/webtools/mrtg/)
2) Remotely, through a remote Nagios server, with ssh, with snmp, with NRPE (Nagios Remote Plugin Executor), or with NSCA (Nagios Service Check Acceptor). It means that the plugin either waits for a verification request from the Nagios server before sending its result, or executes itself and sends the result to the Nagios server.
Other useful developments
Alarm resiliency
1) Nagios gives an immediate status of the monitored elements, it has no memory (except in log). It is useful to keep trace of an incident until it has been checked and acknowledged by an operator.
Network Interfaces discovery
1) Within big networks, it is useful to « compare » real configuration with database configuration. An external program can check every day (auto-discovery) the real network configuration versus Nagios database.
2) If differences appear, notify network administrator of the change.
2) Semi-automatic configuration tool will write Nagios configuration files based on higher level network description files
References
1) Nagios source program
http://www.nagios.org/download/
2) Nagios Extra developments
http://www.nagiosexchange.org/
3) Official plugins
http://nagiosplug.sourceforge.net/
4) Conferences
http://www.nagios.org/propaganda/conferences/
Check URL String in Nagios
October 31st, 2009Graphing In Nagios
October 31st, 2009Graphing Nagios services with pnp4nagios
Nagios is a popular open source computer system and network monitoring software application. It
watches hosts and services, alerting users when things go wrong and again when they get better. Some
of the major features of Nagios are:
• Over 50 bundled plugins for checking common protocols and services (HTTP, FTP, disk space,
S.M.A.R.T., lmsensors, etc). Hundreds of other monitoring plugins are available at nagiosexchange.
org
• Simple API allows plugin creation in any language supported by OS
• Supports many different reports such as availability, alert histograms and top alerts.
• Support for distributed monitoring and clustered configurations
pnp4nagios is a framework written in perl, PHP and C for automatically parsing performance data
collected by Nagios plugins. The data is collected into RRD databases for display in the Nagios web
interface. The graphs created by pnp4nagios are similar to other monitoring tools like Cacti:
pnp4nagios is designed to work well with the standard Nagios plugins and create useable graphs right
out of the box. The appearance of the graphs can be customized.
Pages of related graphs (for instance, CPU usage or TCP connections of each server in a web farm) can
be easily defined and displayed in HTML or exported to PDF.
Because Nagios does not perform checks while performance data is being processed all processing can
be offloaded to the npcd daemon for large installations.
pnp4nagios requires perl, PHP (built with XML, zlib and GD support) and rrdtool. It can optionally use
the rrd perl modules for better performance. Pre-built Fedora packages are available and a Debian
package is planned. Our example will use Ubuntu Server 8.10 with the bundled perl, PHP and Apache
packages. Nagios has been installed from source in the default location of /usr/local/nagios.
Installing pnp4nagios
You use the typical configure/make/make install to install pnp4nagios:
jth@ubuntu:~/pnp-0.4.13$ ./configure
checking for a BSD-compatible install… /usr/bin/install -c
checking build system type… i686-pc-linux-gnu
checking host system type… i686-pc-linux-gnu
…
*** Configuration summary for pnp 0.4.13 02-19-2009 ***
General Options:
————————- ——————-
Nagios user/group: nagios nagios
Install directory: /usr/local/nagios
HTML Dir: /usr/local/nagios/share/pnp
Config Dir: /usr/local/nagios/etc/pnp
Path to rrdtool: /usr/bin/rrdtool (Version 1.2.27)
RRDs Perl Modules: FOUND (Version 1.2027)
RRD Files stored in: /usr/local/nagios/share/perfdata
process_perfdata.pl Logfile: /usr/local/nagios/var/perfdata.log
Perfdata files (NPCD) stored in: /usr/local/nagios/var/spool/perfdata/
jth@ubuntu:~/pnp-0.4.13$ make all
…
jth@ubuntu:~/pnp-0.4.13$ sudo make install
cd ./src && make install
make[1]: Entering directory `/home/jth/pnp-0.4.13/src’
…
*** Main program, Scripts and HTML files installed ***
Please run ‘make install-config’ to install sample
configuration files
jth@ubuntu:~/pnp-0.4.13$ sudo make install-config
cd ./sample-config && make install-config
make[1]: Entering directory `/home/jth/pnp-0.4.13/sample-config’
rm -f /usr/local/nagios/share/pnp/conf/config.php
/usr/bin/install -c -m 755 -o nagios -g nagios -d /usr/local/nagios/etc/pnp
…
Configuring pnp4nagios
The main configuration files for pnp4nagios are located in /usr/local/nagios/share/pnp (web frontend
and graph options) and /usr/local/nagios/etc/pnp (global options – tool paths, access control, etc)
Before configuring pnp, we need to decide how we want Nagios to process the performance data. This
largely depends on the number of monitored hosts and services of Nagios.
• Default mode, where process_perfdata.pl is executed after each host and service check, is
acceptable for small installations.
• Bulk mode, where performance information is appended to a temporary file and processed after
a short interval, is fine for medium-sized installations.
• Setups with hundreds of hosts and services should use bulk mode with npcd, where a separate
multi-threaded daemon handles the processing.
Our example will use Bulk mode, but it is possible to switch between modes as your Nagios setup
grows.
Edit the performance data section of /usr/local/nagios/etc/nagios.cfg:
# PROCESS PERFORMANCE DATA OPTION
# This determines whether or not Nagios will process performance
# data returned from service and host checks. If this option is
# enabled, host performance data will be processed using the
# host_perfdata_command (defined below) and service performance
# data will be processed using the service_perfdata_command (also
# defined below). Read the HTML docs for more information on
# performance data.
# Values: 1 = process performance data, 0 = do not process performance data
process_performance_data=1
# HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS
# These commands are run after every host and service check is
# performed. These commands are executed only if the
# enable_performance_data option (above) is set to 1. The command
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on performance data.
#host_perfdata_command=process-host-perfdata
#service_perfdata_command=process-service-perfdata
# HOST AND SERVICE PERFORMANCE DATA FILES
# These files are used to store host and service performance data.
# Performance data is only written to these files if the
# enable_performance_data option (above) is set to 1.
host_perfdata_file=/usr/local/nagios/var/host-perfdata
service_perfdata_file=/usr/local/nagios/var/service-perfdata
# HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES
# These options determine what data is written (and how) to the
# performance data files. The templates may contain macros, special
# characters (\t for tab, \r for carriage return, \n for newline)
# and plain text. A newline is automatically added after each write
# to the performance data file. Some examples of what you can do are
# shown below.
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$H
OSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHO
STSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNA
ME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\t
SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYP
E::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTAT
ETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
# HOST AND SERVICE PERFORMANCE DATA FILE MODES
# This option determines whether or not the host and service
# performance data files are opened in write (“w”) or append (“a”)
# mode. If you want to use named pipes, you should use the special
# pipe (“p”) mode which avoid blocking at startup, otherwise you will
# likely want the defult append (“a”) mode.
host_perfdata_file_mode=a
service_perfdata_file_mode=a
# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL
# These options determine how often (in seconds) the host and service
# performance data files are processed using the commands defined
# below. A value of 0 indicates the files should not be periodically
# processed.
host_perfdata_file_processing_interval=15
service_perfdata_file_processing_interval=15
# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS
# These commands are used to periodically process the host and
# service performance data files. The interval at which the
# processing occurs is determined by the options above.
host_perfdata_file_processing_command=process-host-perfdata-file
service_perfdata_file_processing_command=process-service-perfdata-file
At the end of /usr/local/nagios/etc/objects/commands.cfg, add the command definitions:
define command{
command_name process-service-perfdata-file
command_line $USER1$/process_perfdata.pl –bulk=/usr/local/nagios/var/serviceperfdata
}
define command{
command_name process-host-perfdata-file
command_line $USER1$/process_perfdata.pl –bulk=/usr/local/nagios/var/hostperfdata
}
Restart Nagios. Now if you look in /usr/local/nagios/share/perfdata, you should start to see rrd files
created by pnp4nagios for all your monitored hosts and services. These rrd files are created with pnp’s
default time settings – 48 hours of 1 minute time step data, 4 years of 360 minute time step data. If you
want more or less, change rra.cfg before running pnp for the first time.
If everything is set up correctly, http://your-nagios-host/nagios/pnp/index.php should show your first
pnp graphs. If not, a debug message should tell you which component was broken or missing.
There is one more step to complete the setup. We need to enable extended info in Nagios so that links
to the graphs are created for each applicable host and service.
Append two entries to /usr/local/nagios/etc/objects/templates.cfg:
define host {
name host-pnp
register 0
action_url /nagios/pnp/index.php?host=$HOSTNAME$
}
define service {
name srv-pnp
register 0
action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$
}
These are templates that you add to each host and service definition with graphs:
define host {
use linux-server,host-pnp
host_name ubuntu
alias ubuntu
address 127.0.0.1
}
define service {
use local-service,srv-pnp
host_name ubuntu
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
After signalling Nagios to restart, the icons for the graphs should appear next to the hosts and services.
I usually replace the “bullet hole” image /usr/local/nagios/share/images/action.gif with the graph image
provided in /usr/local/nagios/share/pnp/images/action.gif
The number of graphs and time period displayed on the web interface is independent of what’s actually
stored in the rrd files. To change these, see the options in the pnp/config.php file.
Example: Setting Up An SNMP Service
check_snmp_int is a 3rd party check plugin specifically designed to check network interface counters
via SNMP. We’ll set it up to create a cacti-style traffic graph for our server. Our server already has
snmpd installed and the community string set to ‘public’.
1. Copy the check_snmp_int script into the plugin directory: /usr/local/nagios/libexec. Make sure it is
owned by the Nagios user and is executable.
2. Create the service definition in /usr/local/nagios/etc/objects/localhost.cfg:
define service {
use local-service,srv-pnp
host_name ubuntu
service_description NIC_ETH0
check_command check_snmp_int!eth0!-f
}
Service definitions should be made as flexible as possible. “!” separates arguments that nagios will
send to the underlying command. The first argument is the interface name and the second argument
contains any additional arguments for the check_snmp_int plugin.
3. Create the command definition in /usr/local/nagios/etc/objects/commands.cfg:
define command {
command_name check_snmp_int
command_line $USER1$/check_snmp_int -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ $ARG2$
}
$HOSTADDRESS$ is built-in macro supplied by Nagios as the hostname of the server being checked.
$USER3$ is the public community string that we’ll define in a minute. $ARG1$ and $ARG2$ are set to
‘eth0′ and ‘-f’ when our check is run.
4. Define the $USER3$ macro in /usr/local/nagios/etc/resource.cfg:
# Sets $USER1$ to be the path to the plugins
$USER1$=/usr/local/nagios/libexec
# Sets $USER2$ to be the path to event handlers
#$USER2$=/usr/local/nagios/libexec/eventhandlers
# SNMP public community string
$USER3$=public
5. To display the graph pnp will look in its template.dist directory and see if there’s a file with the
same name as the command_name of the command definition. If there is, it will use it. If not, it will use
default.pnp. This is a php script that builds arguments to the rrdtool command. If there is a customized
file in the template directory, it will use that instead. But there are two issues with the supplied
check_snmp_int.php: we would like the graph to have the same format as cacti and we’d also like it to
display in the more standard bits per second, not bytes per second as collected.
There is already an included template that does the bytes to bits conversion installed with pnp:
check_snmp_int-bits.php. We’ll copy /usr/local/nagios/share/pnp/templates.dist/check_snmp_intbits.
php to /usr/local/nagios/share/pnp/templates/check_snmp_int.php for our local edits. I’ve
highlighted what to change:
RRD CHEAT SHEET
rrd – file containing multiple sources of time-related data
step – time interval in rra
rra – round-robin archive. An rrd file will contain one of these for each step and timeperiod recorded
DEF – get data from an rrd file
CDEF – create new set of data from existing DEF (can be used for graphing like DEF)
AREA, LINE – draw graph using DEF or CDEF
GPRINT – print inside the graph
#
# Copyright (c) 2006-2008 Joerg Linge (http://www.pnp4nagios.org)
# Plugin: check_iftraffic.pl (COUNTER)
# Output based on Bits/s
#
# $Id: check_snmp_int-bits.php 523 2008-09-26 17:10:20Z pitchfork $
#
#
$opt[1] = " --vertical-label \"Traffic\" -b 1000 --title \"Interface Traffic for $hostname /
$servicedesc\" ";
$def[1] = "DEF:var1=$rrdfile:$DS[1]:AVERAGE " ;
$def[1] .= "DEF:var2=$rrdfile:$DS[2]:AVERAGE " ;
$def[1] .= "CDEF:in_bits=var1,8,* ";
$def[1] .= "CDEF:out_bits=var2,8,* ";
$def[1] .= "AREA:in_bits#00cf00:\"in \" " ;
$def[1] .= "GPRINT:in_bits:LAST:\"%7.2lf %Sbit/s last\" " ;
$def[1] .= "GPRINT:in_bits:AVERAGE:\"%7.2lf %Sbit/s avg\" " ;
$def[1] .= "GPRINT:in_bits:MAX:\"%7.2lf %Sbit/s max\\n\" " ;
$def[1] .= "LINE1:out_bits#002a00:\"out \" " ;
$def[1] .= "GPRINT:out_bits:LAST:\"%7.2lf %Sbit/s last\" " ;
$def[1] .= "GPRINT:out_bits:AVERAGE:\"%7.2lf %Sbit/s avg\" " ;
$def[1] .= "GPRINT:out_bits:MAX:\"%7.2lf %Sbit/s max\\n\" ";
if($NAGIOS_TIMET != ""){
$def[1] .= "VRULE:".$NAGIOS_TIMET."#000000:\"Last Service Check \\n\" ";
}
if($NAGIOS_LASTHOSTDOWN != ""){
$def[1] .= "VRULE:".$NAGIOS_LASTHOSTDOWN."#FF0000:\"Last Host Down\\n\" ";
}
?>
Now our graph will look like this (I cheated a bit because our new graph would not have collected this
much data yet, but this it what it would look like in a few hours):
Example: Migrating a legacy graphing tool (Orca) to pnp4nagios
Rationale: during an upgrade from Nagios 1 to Nagios 3, a decision was made to replace Orca (which
is not very actively maintained by the authors) running on Solaris 8 & 10 servers with pnp4nagios. The
Orca collector would remain on the system, generating the same statistics, but the statistics will be
collected and displayed by Nagios and pnp.
Orca works by collecting performance data regularly on each monitored system. Periodically the
collector logs are copied to a central server where they are processed into rrd files. A cgi script allows a
web browser to view the graphs. The basic steps to migrate this service are:
1. Configure Orca to write logs to new location and rotate daily
2. Write Nagios plugin to read orca log and return appropriate data
3. Write pnp4nagios template to display the data
4. Turn off old Orca server
5. Profit
The first line of an Orca log contains a space separated list of parameters monitored. The second and
subsequent lines contain timestamped performance data. This makes it easy to write a script to request
parameters and return values.
root@orcaclient [16:43:51 tmp] head -2 /tmp/orcallator-2009-03-28-000
timestamp locltime uptime state_D state_N state_n state_s state_r state_k state_c state_m state_d
state_i state_t DNnsrkcmdit usr% sys% wio% idle% … (several dozen more parameters)
1238223600 00:00:00 2443807 8 2 0 0 0 0 0 8 2
0 2 RgwwwwwRgwg 1.2 22.6 0.0 76.2 … (several dozen more parameters)
The Nagios plugin is called with a list of all parameters requested:
check_solaris_orcallator -q “#proc/s,tcp_Icn/s,tcp_Ocn/s,dnlc_hit%,inod_hit%,tcp_Ret%,tcp_Dup
%,tcp_estb,tcp_Rst/s,tcp_Atf/s,tcp_Ldrp/s,tcp_LdQ0/s,tcp_HOdp/s,smtx,smtx/cpu,disk_rd/s,disk_wr/s,disk_
rK/s,disk_wK/s,dnlc_ref/s,inod_ref/s,scanrate,pageslock,usr%,sys%,wio%,idle%”
Here is the output in standard Nagios plugin performance format. (See the Nagios docs for more details
on this format)
ORCA OK – tcp_Atf/s 0.000 tcp_Ldrp/s 0.000 tcp_LdQ0/s 0.000 tcp_HOdp/s 0.000 smtx 972 smtx/cpu 60
disk_rd/s 14.5 disk_wr/s 222.3 disk_rK/s 463.5 disk_wK/s 11454.4 #proc/s 2.247 dnlc_ref/s 4441.080
inod_ref/s 0.740 scanrate 0.000 pageslock 13326231 usr% 0.2 sys% 2.2 wio% 0.0 idle% 97.5 tcp_Icn/s
0.397 tcp_Ocn/s 0.217 dnlc_hit% 99.992 inod_hit% 100.000 tcp_Ret% 0.000 tcp_Dup% 0.000 tcp_estb 337
tcp_Rst/s 0.403 |’tcp_Atf/s’=0.000 ‘tcp_Ldrp/s’=0.000 ‘tcp_LdQ0/s’=0.000 ‘tcp_HOdp/s’=0.000 ‘smtx’=972
‘smtx/cpu’=60 ‘disk_rd/s’=14.5 ‘disk_wr/s’=222.3 ‘disk_rK/s’=463.5 ‘disk_wK/s’=11454.4 ‘#proc/s’=2.247
‘dnlc_ref/s’=4441.080 ‘inod_ref/s’=0.740 ‘scanrate’=0.000 ‘pageslock’=13326231 ‘usr%’=0.2 ‘sys%’=2.2
‘wio%’=0.0 ‘idle%’=97.5 ‘tcp_Icn/s’=0.397 ‘tcp_Ocn/s’=0.217 ‘dnlc_hit%’=99.992 ‘inod_hit%’=100.000
‘tcp_Ret%’=0.000 ‘tcp_Dup%’=0.000 ‘tcp_estb’=337 ‘tcp_Rst/s’=0.403
The complete check_solaris_orcallator perl script is included at the end of the presentation. This script
needs to be copied to each monitored host.
The nagios service definition looks like this:
define service {
use service-unix-template,srv-pnp
hostgroup_name orcallator
service_description ORCA_ALL
check_command check_nrpe_solaris_orcallator_all
}
The command definition looks like:
define command {
command_name check_nrpe_solaris_orcallator_all
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_solaris_orcallator_all
}
check_nrpe is the standard Nagios agent for running check programs on remote hosts.
Each monitored host has this entry in its /usr/local/nagios/etc/nrpe.cfg:
command[check_solaris_orcallator_all]=/usr/local/nagios/libexec/check_solaris_orcallator -q
“#proc/s,tcp_Icn/s,tcp_Ocn/s,dnlc_hit%,inod_hit%,tcp_Ret%,tcp_Dup
%,tcp_estb,tcp_Rst/s,tcp_Atf/s,tcp_Ldrp/s,tcp_LdQ0/s,tcp_HOdp/s,smtx,smtx/cpu,disk_rd/s,disk_wr/s,disk_
rK/s,disk_wK/s,dnlc_ref/s,inod_ref/s,scanrate,pageslock,usr%,sys%,wio%,idle%”
The last task is to create a pnp template for graph display. Each counter/value pair returned from the
orca plugin is placed into its own data source (DS) in the rrd file by pnp (in order). The appearance and
legend of the graphs follow Orca as closely as possible. Here is the code for displaying the first graph –
the full version for all 15 graphs is at the end of the presentation.
#
# display Orca counters. The file is in DS order, the actual graphs are displayed in
# $opt[]/$def[] order on the web pages
$opt[1] = "--lower-limit 0 --upper-limit 100 --rigid --vertical-label \"\" -b 1000
--title \"$hostname / CPU\"";
$ds_name[1] = "usr%,sys%,wio%,idle%";
$def[1] = "DEF:var1=$rrdfile:$DS[16]:AVERAGE " ;
$def[1] .= "DEF:var2=$rrdfile:$DS[17]:AVERAGE " ;
$def[1] .= "DEF:var3=$rrdfile:$DS[18]:AVERAGE " ;
$def[1] .= "DEF:var4=$rrdfile:$DS[19]:AVERAGE " ;
$def[1] .= "AREA:var1#0000ff:$NAME[16]:STACK ";
$def[1] .= "GPRINT:var1:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var1:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var1:MAX:\"%3.1lf max\" " ;
$def[1] .= "AREA:var2#ff0000:$NAME[17]:STACK ";
$def[1] .= "GPRINT:var2:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var2:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var2:MAX:\"%3.1lf max\\n\" ";
$def[1] .= "AREA:var3#8b00cc:$NAME[18]:STACK ";
$def[1] .= "GPRINT:var3:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var3:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var3:MAX:\"%3.1lf max\" ";
$def[1] .= "AREA:var4#00cf00:$NAME[19]:STACK ";
$def[1] .= "GPRINT:var4:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var4:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var4:MAX:\"%3.1lf max\\n\" ";
When it is all put together, the collection of Orca graphs looks like this:
Custom scripts:
/usr/local/nagios/libexec/check_solaris_orcallator
#!/usr/bin/perl -wT
use strict;
use Getopt::Std;
$ENV{'PATH'}='/usr/bin';
# Nagios plugin return values
my $RET_OK = 0;
my $RET_WARN = 1;
my $RET_CRIT = 2;
my $RET_UNK = 3;
my $result = 'UNKNOWN';
my ($key, $value);
my $sorted_key;
my %stat_map;
my $addr;
my %values;
my $ret;
my $stats;
my $count;
my %opts;
getopts ('q:', \%opts);
my $QUERY = $opts{q} || '';
unless ($QUERY) {
usage ();
exit $RET_WARN;
}
my $orca_logloc = '/tmp/orcallator*';
my $orca_log = `ls -t $orca_logloc | head -1 2>/dev/null`;
unless ($orca_log) {
print “Can’t find orcallator log file in $orca_logloc”;
exit $RET_UNK;
}
chomp ($orca_log);
$count = 0;
open ORCA, “$orca_log” or die “Can’t open $orca_log: $!”;
my $line =
chomp $line;
foreach (split /\s+/, $line) {
$stat_map{$_} = $count;
++$count;
}
my @queries = split /,/,$QUERY;
foreach (@queries) {
unless (exists $stat_map{$_}) {
print “Unknown orca stat $_\n”;
exit $RET_WARN;
}
}
# read last line
while (
$addr = tell (ORCA) unless eof (ORCA);
} seek (
ORCA, $
addr, 0);
$line =
chomp $line;
close ORCA or warn “close: $orca_log: $!”;
my @stats = split /\s+/, $line;
$count = 1;
foreach (@queries) {
$values{“${count}_$_”} = $stats[$stat_map{$_}];
++$count;
}
foreach $sorted_key (sort keys %values) {
$key = $sorted_key;
$key =~ s/^\d+_//;
$ret .= qq($key $values{$sorted_key} );
$stats .= qq(‘$key’=$values{$sorted_key} );
}
print “ORCA OK – $ret|$stats\n”;
exit 0;
sub usage () {
print STDERR qq(usage: $0 -q “orcavalue,orcavalue,…”\n);
}
/usr/local/nagios/share/pnp/templates/check_nrpe_solaris_orcallator_all.php
#
# display Orca counters. The file is in DS order, the actual graphs are displayed in
# $opt[]/$def[] order on the web page
$opt[10] = "--vertical-label \"\" -b 1000 --title \"$hostname / TCP Attempt Fail Rate Per Sec\"";
$ds_name[10] = "tcp_Atf_s";
$def[10] = "DEF:var1=$rrdfile:$DS[1]:AVERAGE " ;
$def[10] .= "AREA:var1#00cf00:\"$NAME[1] \" " ;
$def[10] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[10] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[10] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$opt[11] = "--vertical-label \"\" -b 1000 --title \"$hostname / TCP Listen Drop Rate Per Sec\"";
$ds_name[11] = "tcp_Ldrp_s,tcp_LdQ0_s,tcp_HOdp_s";
$def[11] = "DEF:var1=$rrdfile:$DS[2]:AVERAGE " ;
$def[11] .= "DEF:var2=$rrdfile:$DS[3]:AVERAGE " ;
$def[11] .= "DEF:var3=$rrdfile:$DS[4]:AVERAGE " ;
$def[11] .= "LINE:var1#00ff00:\"$NAME[2] \" " ;
$def[11] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[11] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[11] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[11] .= "LINE1:var2#ff0000:\"$NAME[3] \" " ;
$def[11] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[11] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[11] .= "GPRINT:var2:MAX:\"%7.2lf %S max\\n\" ";
$def[11] .= "LINE2:var3#0000ff:\"$NAME[4] \" " ;
$def[11] .= "GPRINT:var3:LAST:\"%7.2lf %S last\" " ;
$def[11] .= "GPRINT:var3:AVERAGE:\"%7.2lf %S avg\" " ;
$def[11] .= "GPRINT:var3:MAX:\"%7.2lf %S max\\n\" ";
$opt[9] = "--vertical-label \"\" -b 1000 --title \"$hostname / Sleeps On Mutex Per Sec\"";
$ds_name[9] = "smtx,smtx_cpu";
$def[9] = "DEF:var1=$rrdfile:$DS[5]:AVERAGE " ;
$def[9] .= "DEF:var2=$rrdfile:$DS[6]:AVERAGE " ;
$def[9] .= "AREA:var1#00cf00:\"$NAME[5] \" " ;
$def[9] .= "GPRINT:var1:LAST:\"%8.2lf %S last\" " ;
$def[9] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[9] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[9] .= "LINE1:var2#0000ff:\"$NAME[6] \" " ;
$def[9] .= "GPRINT:var2:LAST:\"%3.2lf %S last\" " ;
$def[9] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[9] .= "GPRINT:var2:MAX:\"%7.2lf %S max\\n\" ";
$opt[4] = "--vertical-label \"\" -b 1000 --title \"$hostname / Disk System-wide R/W Ops Per Sec\"";
$ds_name[4] = "disk_rd_s,disk_wr_s";
$def[4] = "DEF:var1=$rrdfile:$DS[7]:AVERAGE " ;
$def[4] .= "DEF:var2=$rrdfile:$DS[8]:AVERAGE " ;
$def[4] .= "AREA:var1#00cf00:\"$NAME[7] \" " ;
$def[4] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[4] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[4] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[4] .= "LINE1:var2#0000ff:\"$NAME[8] \" " ;
$def[4] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[4] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[4] .= "GPRINT:var2:MAX:\"%7.2lf %S max\\n\" ";
$opt[5] = "--vertical-label \"\" -b 1000 --title \"$hostname / Disk System-wide R/W Transfer Rate KB
Per Sec\"";
$ds_name[5] = "disk_rK_s,disk_wK_s";
$def[5] = "DEF:var1=$rrdfile:$DS[9]:AVERAGE " ;
$def[5] .= "DEF:var2=$rrdfile:$DS[10]:AVERAGE " ;
$def[5] .= "AREA:var1#00cf00:\"$NAME[9] \" " ;
$def[5] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[5] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[5] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[5] .= "LINE1:var2#0000ff:\"$NAME[10] \" " ;
$def[5] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[5] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[5] .= "GPRINT:var2:MAX:\"%7.2lf %S max\\n\" ";
$opt[8] = "--vertical-label \"\" -b 1000 --title \"$hostname / New Processes Per Sec\"";
$ds_name[8] = "#proc_s";
$def[8] = "DEF:var1=$rrdfile:$DS[11]:AVERAGE " ;
$def[8] .= "AREA:var1#00cf00:\"$NAME[11] \" " ;
$def[8] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[8] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[8] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$opt[3] = "--vertical-label \"\" -b 1000 --title \"$hostname / Disk Cache Refs Per Sec\"";
$ds_name[3] = "dnlc_ref_s,inod_ref_s";
$def[3] = "DEF:var1=$rrdfile:$DS[12]:AVERAGE " ;
$def[3] .= "DEF:var2=$rrdfile:$DS[13]:AVERAGE " ;
$def[3] .= "AREA:var1#00cf00:\"$NAME[12] \" " ;
$def[3] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[3] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[3] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[3] .= "LINE1:var2#0000ff:\"$NAME[13] \" " ;
$def[3] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[3] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[3] .= "GPRINT:var2:MAX:\"%7.2lf %S max\" " ;
$opt[7] = "--vertical-label \"\" -b 1000 --title \"$hostname / Memory Pages Scanned Per Sec\"";
$ds_name[7] = "scanrate";
$def[7] = "DEF:var1=$rrdfile:$DS[14]:AVERAGE " ;
$def[7] .= "AREA:var1#00cf00:\"$NAME[14] \" " ;
$def[7] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[7] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[7] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$opt[6] = "--vertical-label \"\" -b 1000 --title \"$hostname / Locked Memory Pages\"";
$ds_name[6] = "pageslock";
$def[6] = "DEF:var1=$rrdfile:$DS[15]:AVERAGE " ;
$def[6] .= "AREA:var1#00cf00:\"$NAME[15] \" " ;
$def[6] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[6] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[6] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$opt[1] = "--lower-limit 0 --upper-limit 100 --rigid --vertical-label \"\" -b 1000
--title \"$hostname / CPU\"";
$ds_name[1] = "usr%,sys%,wio%,idle%";
$def[1] = "DEF:var1=$rrdfile:$DS[16]:AVERAGE " ;
$def[1] .= "DEF:var2=$rrdfile:$DS[17]:AVERAGE " ;
$def[1] .= "DEF:var3=$rrdfile:$DS[18]:AVERAGE " ;
$def[1] .= "DEF:var4=$rrdfile:$DS[19]:AVERAGE " ;
$def[1] .= "AREA:var1#0000ff:$NAME[16]:STACK ";
$def[1] .= "GPRINT:var1:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var1:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var1:MAX:\"%3.1lf max\" " ;
$def[1] .= "AREA:var2#ff0000:$NAME[17]:STACK ";
$def[1] .= "GPRINT:var2:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var2:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var2:MAX:\"%3.1lf max\\n\" ";
$def[1] .= "AREA:var3#8b00cc:$NAME[18]:STACK ";
$def[1] .= "GPRINT:var3:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var3:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var3:MAX:\"%3.1lf max\" ";
$def[1] .= "AREA:var4#00cf00:$NAME[19]:STACK ";
$def[1] .= "GPRINT:var4:LAST:\"%3.1lf last\" " ;
$def[1] .= "GPRINT:var4:AVERAGE:\"%3.1lf avg\" " ;
$def[1] .= "GPRINT:var4:MAX:\"%3.1lf max\\n\" ";
$opt[12] = "--vertical-label \"\" -b 1000 --title \"$hostname / TCP Connections Per Sec\"";
$ds_name[12] = "tcp_Icn_s,tcp_Ocn_s";
$def[12] = "DEF:var1=$rrdfile:$DS[20]:AVERAGE " ;
$def[12] .= "DEF:var2=$rrdfile:$DS[21]:AVERAGE " ;
$def[12] .= "AREA:var1#00cf00:\"$NAME[20] \" " ;
$def[12] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[12] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[12] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[12] .= "LINE1:var2#0000ff:\"$NAME[21] \" " ;
$def[12] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[12] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[12] .= "GPRINT:var2:MAX:\"%7.2lf %S max\" " ;
$opt[2] = "--vertical-label \"\" -b 1000 --title \"$hostname / Disk Cache Utilization\"";
$ds_name[2] = "dnlc_hit%,inod_hit%";
$def[2] = "DEF:var1=$rrdfile:$DS[22]:AVERAGE " ;
$def[2] .= "DEF:var2=$rrdfile:$DS[23]:AVERAGE " ;
$def[2] .= "AREA:var1#00cf00:\"$NAME[22] \" " ;
$def[2] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[2] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[2] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[2] .= "LINE1:var2#0000ff:\"$NAME[23] \" " ;
$def[2] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[2] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[2] .= "GPRINT:var2:MAX:\"%7.2lf %S max\" " ;
$opt[15] = "--vertical-label \"\" -b 1000 --title \"$hostname / TCP Retrans & Dupes\"";
$ds_name[15] = "tcp_Ret%,tcp_Dup%";
$def[15] = "DEF:var1=$rrdfile:$DS[24]:AVERAGE " ;
$def[15] .= "DEF:var2=$rrdfile:$DS[25]:AVERAGE " ;
$def[15] .= "AREA:var1#00cf00:\"$NAME[24] \" " ;
$def[15] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[15] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[15] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$def[15] .= "LINE1:var2#0000ff:\"$NAME[25] \" " ;
$def[15] .= "GPRINT:var2:LAST:\"%7.2lf %S last\" " ;
$def[15] .= "GPRINT:var2:AVERAGE:\"%7.2lf %S avg\" " ;
$def[15] .= "GPRINT:var2:MAX:\"%7.2lf %S max\" " ;
$opt[13] = "--vertical-label \"\" -b 1000 --title \"$hostname / TCP Established Connections\"";
$ds_name[13] = "tcp_estb";
$def[13] = "DEF:var1=$rrdfile:$DS[26]:AVERAGE " ;
$def[13] .= "AREA:var1#00cf00:\"$NAME[26] \" " ;
$def[13] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[13] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[13] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
$opt[14] = "--vertical-label \"\" -b 1000 --title \"$hostname / TCP Resets Per Sec\"";
$ds_name[14] = "tcp_Rst_s";
$def[14] = "DEF:var1=$rrdfile:$DS[27]:AVERAGE " ;
$def[14] .= "AREA:var1#00cf00:\"$NAME[27] \" " ;
$def[14] .= "GPRINT:var1:LAST:\"%7.2lf %S last\" " ;
$def[14] .= "GPRINT:var1:AVERAGE:\"%7.2lf %S avg\" " ;
$def[14] .= "GPRINT:var1:MAX:\"%7.2lf %S max\\n\" " ;
?>

