The following monitors are provided with the distribution, to get
you started. It's simple to add your own monitors. See the man page
for "mon" to learn how.
fping.monitor
-------------
This pings a list of hosts efficiently using the fping program,
from the Satan distribution. fping.monitor is just a simple
shell wrapper for fping, and is normally invoked with just the
list of hosts to ping. Here's a trick: say you don't want to
trigger an alert until a machine has unpingable for some number
of minutes. Give fping.monitor the arguments "-r 3 -t 240000".
arguments:
-a only report failure if all hosts fail
-r num retry "num" times for each host before
reporting failure
-t num set timeout between retries to "num"
milliseconds
-s num consider hosts whose response time exceed
"num" milliseconds as failures
-T for each failed host (no response only),
traceroute to the system and report the
output
ping.monitor
------------
Similar to fping.monitor, but uses the system's ping program.
This serializes the pings, which is normally bad to do. This
is simply an alternative to fping.monitor, if you can't get
fping to compile. I've only tested it with Linux and Solaris.
freespace.monitor
-----------------
This will monitor disk space usage of a particular NFS server.
Arguments are supplied as "path:kBfree [path:kBfree...]". If
free space dips below kBfree, then it returns a failure condition,
and the output is how much space is left on that server.
If you use this monitor, use separate mounts for the volumes that
you want to test, mounting them with the "-o ro,intr,soft" options,
so things don't hang too bad if the server is down.
You should use the ";;" directive to the monitor line, because
freespace.monitor doesn't take a list of hosts. Here's an example:
watch nfsservers
service fping
interval 5m
monitor fping.monitor
alert mail.alert [email protected]
alert netpage.alert mis-pager
alertevery 60m
service freespace
interval 10m
monitor freespace.monitor /server1:5000000 /server2:5000000 ;;
alert mail.alert [email protected]
alert netpage.alert mis-pager
alertevery 60m
tcp.monitor
-----------
Useful to see if it's possible to connect to a particular port
on a particular server. This is over-simplified, and does not yet
support parsing of the output from these services. Options are
"-p port" to tell which port to check, and a list of hosts.
http_t.monitor
--------------
This monitor, contributed by Jon Meek ([email protected]), will
use HTTP to connect to a server, get a page, and log the transfer
speed of the transaction. It uses the Time::HiRes Perl module,
available from CPAN. It can also register a failure if the transfer
doesn't complete within a certain number of seconds. See the
source code for an explanation of the arguments.
http_tp.monitor
---------------
Used to measure and log http file transfer speed and use a proxy.
See the comments in the source code for instructions.
Requires Time::HiRes, LWP::UserAgent, and HTTP::Request.
dns.monitor
-----------
dns.monitor will make several DNS queries to verify that a server is
providing the correct information for a zone. The zone argument is
the zone to check. There can be multiple zone arguments. The master
argument is the master server for the zone. It will be queried for
the base information. Then each server will be queried to verify
that it has the correct answers. It is assumed that each server is
supposed to be authoritative for the zone.
ftp.monitor
-----------
Connect to an ftp server, wait for an acceptible prompt, and log
out.
hpnp.monitor
------------
Uses SNMP to monitor HP JetDirect-equipped printers. Reports
failures as told by the various objects in HP's MIB, and returns
the message that is showing on the printer's LCD ("LOW TONER",
"LOAD LETTER", etc.).
http.monitor
------------
Connects to an http server, retrieves a URL, and returns true
if everything is OK.
imap.monitor
------------
Connects to an IMAP server, checks for a sane response, and
then logs out.
ldap.monitor
------------
This script will search an LDAP server for objects that match the
-filter option, starting at the DN given by the -basedn option. Each
DN found must contain the attribute given by the -attribute option
and the attribute's value must match the value given by the -value
option. Servers are given on the command line. At least one server
must be specified.
netappfree.monitor
------------------
Use SNMP to get free disk space from a Network Appliance
exits with value of 1 if free space on any host drops below
the supplied parameter, or exits with the value of 2 if
there is a "soft" error (SNMP library error, or could not get a
response from the server).
This requires the UCD SNMP library and G.S. Marzot's Perl SNMP
module.
Supply a configuration file with "--config file" option (see
etc/netappfree.cf for an example), or "--list" for a listing
of filesystems which are on your filers. Use --list for help in
building a configuration file.
nntp.monitor
------------
Tries to connect to a nntp server, and wait for the right output.
ping.monitor
------------
Returns a list of hosts which not reachable via ICMP echo. Uses the
system's default ping, rather than fping.
pop3.monitor
------------
Connects to a POP3 server, waits for the OK prompt, then logs out.
process.monitor
---------------
Monitor snmp processes.
Arguments are: [-c community] host [host ...]
This script will exit with value 1 if host:community has
processErrorFlag set. The summary output line will be the host names
that failed and the name of the process. The detail lines are what
UCD snmp returns for an ErrorMsg. ('Too (many|few) (name) running (#
= x)'). If there is an SNMP error (either a problem with the SNMP
libraries, or a problem communicating via SNMP with the destination
host), this script will exit with a warning value of 2.
There probably should be a better way to specify a given process to
watch instead of everything-ucd-snmp-is-watching.
reboot.monitor
--------------
Polls the SNMP agent on hosts, and triggers a failure when a reboot
is detected.
smtp.monitor
------------
Connects to an SMTP server, waits for a prompt, and then logs out.
telnet.monitor
--------------
Use tcp_scan to try to connect to the telnet port on a bunch of hosts,
and look for a "login" prompt.
msql-mysql.monitor, rpc.monitor
-------------------------------
See the separate README for these monitors.
readdir.monitor
---------------
From: gilles LamiraL <[email protected]>
To: "[email protected]" <[email protected]>
Subject: readdir monitor
Hello,
I wrote a monitor that reads several directories and tells
if the number of files in each directory exceeds a given number.
Possible uses are testing /var/spool/mqueue or /var/spool/lp/
It is a local monitor. No SNMP here. I think it can be easyly
called from an SNMP agent.
1) The allowed number can be specified for each directory.
2) You can add a regex filter to match the file names.
Only one regex is allowed for all directories.
Tell me if you want one for directory.
3) The return status is interesting. It gives the exceeded values
in a log based 2 way.
For example:
You want to check if /var/spool/mqueue contains less than 100 messages
$ ls /var/spool/mqueue | wc -l
479
$ ./my-readdir.monitor /var/spool/mqueue:100
/var/spool/mqueue:479
$ echo $?
3
1 means more than 100 messages
2 means more than 200 messages
3 means more than 400 messages
4 means more than 800 messages
...
255 means more than 5.79 * 10^76 messages (579 + 74 zeros !)
Nice ?
See more example in the script itself.
up_rtt.monitor
--------------
mon monitor to check for circuit up and measure RTT.
Jon Meek - 09-May-1998. Requires Perl Modules "Time::HiRes" and
"Statistics::Descriptive".
dialin.monitor
--------------
Dials in to a modem and fails if a carrier and a prompt is not
detected. Useful for telling if your modem pool is down or if some
spaz modem has quit answering the phone.
dialin.monitor requires the Perl Expect module, available from
CPAN.
This program performs UUCP-style locking, and needs to run setgid
uucp to accomplish this. Provided is dialin.monitor.wrap.c, a simple
little C program which is installed as setgid uucp and directly
executes the actual dialin.monitor Perl script. This is required
because some systems (e.g. Linux) do not allow setuid/setgid scripts.
To build, edit the Makefile in mon.d, and adjust monpath to your
environment. The do:
make && make install
dialin.monitor accepts several arguments. The only required argument
is "-n", which specifies the phone number to dial.
-n number dial in to "number"
-t secs timeout to wait for "CONNECT" from modem (60)
-l lockdir directory to use for UUCP-style locking ("/var/lock")
-D device serial device to use ("/dev/modem")
foundry-chassis.monitor
-----------------------
Reports the power supply and fan status of Foundry chassis-based
switches, like the BigIron and the FastIron. This uses the
"FOUNDRY-SN-AGENT-MIB" and "FOUNDRY-SN-ROOT-MIB". Foundry annoyingly
ships their MIBs in one giant file. What I do is separate them into
distinct files so that the UCD tools don't need to parse the single
giant file.
I've tested this with staged failures of PSUs and it works fine.
It actually caught an actual non-staged failure once.
Arguments are:
-c community SNMP community to use
silkworm.monitor
---------------
Reports port, fan, power supply, and temperature failures in Brocade
SilkWorm FCAL switches. It requires Brocade's "SW-MIB" MIB.
Sensor failures are explicitly reported by the agent, read by this
monitor, and reported to mon. This monitor identifies port problems
by paying attention to only those ports whose administrative status is
"online", yet the actual operational status is not "online".
This monitor has not yet been tested in the case of an actual (or
staged) failure. That doesn't mean it doesn't work--it's just that
it hasn't been tested :)
Arguments are:
-c community SNMP community to use
cpqhealth.monitor
-----------------
Report fan, PSU, and temperature failures from systems running the
Compaq "Insight Manager". It requires the "CPQHLTH-MIB" MIB,
and the UCD SNMP libs w/the Perl module.
We've had this running for a little while now, and both tested it with
"staged" failures and actual failures, and it seems to work rather
well. The Insight agent is a bit quirky, though. I've seen where
it reports that both PSUs are installed, running without error,
yet it says it is not in a redundant configuration.
Arguments are:
-c community SNMP community to use
mon.monitor
-----------
Report the running status of a mon server.
Arguments are:
-p port port to use, defaults to 2583
-t timeout timeout in seconds, defaults to 30
-u username username (optional)
-p password password (optional)
traceroute.monitor
------------------
Monitor routes from monitor machine to a remote system
using traceroute. Alarm and log when changes are detected.
See embedded POD documentation for details.
smtp3.monitor
-------------
smtp monior which performs logging of connect times.
See embedded POD documentation for details.
http_tpp.monitor
---------------
Parallel query http server monitor for mon. Logs timing
and size results, can use a proxy server, and can
incorporate a "Smart Alarm" function via a user supplied
Perl module. See embedded POD documentation for details.
file_change.monitor
-------------------
file_change.monitor will watch specified files in a
directory and trigger an alert when any monitored file
changes, or is missing. File changes can optionally be
logged using RCS.
See embedded POD documentation for details.
na_quota.monitor
----------------
report quota limits on network appliance filers. see the comments
in the file for details.