Nagios 装配、配置和使用 操作
1. 概述
备注:
[1]:CPU,Mem,Disk,Network
[2]:keystone, glance-api, glance-register, nova-api, nova-computer,nova-network, nova-scheduler, nova-volume, nova-objectstores, mysql,dnsmasq, rabbitmq, etc
2. References
Nagios官方docs:http://www.nagios.org/documentation
参考手册:http://library.nagios.com/library/products/nagioscore/manuals
PluginResources:http://exchange.nagios.org/
TarResources:http://sourceforge.net/projects/nagios/files/?source=navbar
3. 环境准备
操作系统:Ubuntu 12.04 LTS 64x server
Nagioscore Version:nagios-3.4.4
NRPEVersion:nrpe-2.14
NDOUtilsVersion:ndoutils-1.5.2
Dependslist:
-
apache2
-
libapache2-mod-php5
-
build-essential
-
libgd2-xpm-dev
-
make
-
gcc
-
xinetd
4. 环境安装和配置
4.1环境拓扑图
+-------------------------+
| Horizon node | +---------------+
| Nagios core | ======> | msyql |
| Nagios plugin | +---------------+
| NDOUtils |
+------------------------ +
+-----------------------+ || || +---------------------+
| controller node | <===== || || ====> | computenode |
|Nagios NRPE | | Nagios NRPE |
|Nagios Plugin | | Nagios Plugin |
+-----------------------+ +---------------------+
4.2 Horizon node
4.2.1 Nagios Core
默认安装在目录:/usr/local/nagios
1. 安装依赖包
$apt-get install make gcc apache2 libapache2-mod-php5 build-essentiallibgd2-xpm-dev
2.创建用户和组
$/usr/sbin/useradd -m -s /bin/bash nagios
$passwd nagios
$/usr/sbin/groupadd nagios
$/usr/sbin/usermod -G nagios nagios
$/usr/sbin/groupadd nagcmd
$/usr/sbin/groupadd nagcmd
$/usr/sbin/usermod -a -G nagcmd www-data
3.下载安装源码:
$ sunagios
$mkdir ~/download
$ cd~/download
$wgethttp://jaist.dl.sourceforge.net/project/nagios/nagios-3.x/nagios-3.4.4/nagios-3.4.4.tar.gz
PS:已下载文件:./资料/src/nagios-3.4.4.tar.gz
4. 编译安装
$ tar–zxvf nagios-3.4.4.tar.gz
$ cdnagios-3.4.4
$./configure --with-command-group=nagcmd
$make all
$make install && make install-init && makeinstall-config && make install-commandmode
5.配置Nagioscore
配置文件目录:/usr/local/nagios/etc
4.2.2 Nagios Plugin
1.下载安装源码
$ cd~/download
$wgethttp://jaist.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.16/nagios-plugins-1.4.16.tar.gz
PS:已下载文件:./资料/src/nagios-plugins-1.4.16.tar.gz
2. 编译安装
$ tar-xzvf nagios-plugin-1.4.16.tar.gz
$ cdnagios-plugin-1.4.16
$./configure --with-nagios-user=nagios --with-nagios-group=nagios
$make && make install
3.检查安装和配置
$ ls/usr/local/nagios/libexec/显示check_*为所有已安装的插件。
4.2.3 NDOUtils
1.下载安装源码
$ cd~/download
$wgethttp://jaist.dl.sourceforge.net/project/nagios/ndoutils-1.x/ndoutils-1.5.2/ndoutils-1.5.2.tar.gz
PS:已下载文件:./资料/src/ndoutils-1.5.2.tar.gz
2. 编译安装
$ tar-xzvf ndoutils-1.5.2.tar.gz
$ cdndoutils-1.5.2
$./configure --prefix=/usr/local/nagios/ --enable-mysql
$make
$ cpndomod-3x.o ndo2db-3x log2ndo file2sock /usr/local/nagios/bin
3. dbinit
$mysql -uroot -p123qwe <<EOF
createdatabase nagios
usenagios
grantall privileges on nagios.* to ‘nagios’@’%’ identified by‘nagios’
flushprivileges;
source~/download/ndoutils-1.5.2/db/mysql.sql;
quit
EOF
4. 配置
(1) $cat ~/download/ndoutils-1.5.2/config/nagios.cfg >>/usr/local/nagios/etc/nagios.cfg
(2)在/usr/local/nagios/etc/nagios.cfg中修改最后新增broker_module为nagios-3x.o
(3) $ cd~/download/ndoutils-1.5.2/config/
(4) $ mvndomod.cfg-sample ndomod.cfg && mv ndo2db.cfg-samplendo2db.cfg
(5) $cpndomod.cfg ndo2db.cfg /usr/local/nagios/etc/
(6)修改ndo2db.cfg中的参数db_host、db_user、db_pass。
(7)启动NDO2DB进程:
$/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
(8)重启启动Nagios进程:
$ sudokill <nagios ps>
$/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
(9)检查配置是否正确:
a/usr/local/nagios/var/nagios.log中能够找到
[1361448234]ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios CoreDevelopment Team and Community Contributors
[1361448234]ndomod: Successfully connected to data sink. 0 queued items toflush.
[1361448234]Event broker module '/usr/local/nagios/bin/ndomod-3x.o' initializedsuccessfully.
b $ps-ef | grep nagios
------------------------------------------------
nagios 22862 1 0 11:53 ? 00:00:00/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
nagios 23449 22862 0 12:03 ? 00:00:00/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
nagios 23450 23449 0 12:03 ? 00:00:00/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
nagios 23453 1 0 12:03 ? 00:00:00 /usr/local/nagios/bin/nagios-d /usr/local/nagios/etc/nagios.cfg
4.3 Controller node
4.3.1 Nagios Plugin
同4.2.2
4.3.2 NRPE
1. 安装依赖包
$apt-get install make gcc xinetd
2.创建用户和组
同4.2.1:2
3.下载安装源码:
$ sunagios
$mkdir ~/download
$ cd~/download
$wgethttp://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
PS:已下载文件:./资料/src/nrpe-2.14.tar.gz
4. 编译安装
$ tar-zxvf nrpe-2.14.tar.gz
$ cdnrpe-2.14
$./configure
$make all
$make install-plugin && make install-daemon && makeinstall-daemon-config && make install-xinetd
5. 配置
(1)在/etc/xinetd.d/nrpe的only_from变量中增加<Horizonip | Nagios core ip>
(2)在/etc/services中增加
nrpe 5666/tcp # NRPE
(3) $service xinetd restart
(4)开启防火墙:
$sudo iptables -A INPUT -p tcp -m tcp --dport 5666 -j ACCEPT
$iptables-save
$ vim/etc/network/interface的网卡配置中增加pre-upiptables-restore < /etc/iptables.up.rules
(4)在本地检查配置是否正确:
$netstat -at | grep nrpe
>tcp 0 0 *:nrpe *:* LISTEN
$/usr/local/nagios/libexec/check_nrpe -H localhost
>NRPE v2.14
(5)在Horizon| Nagios core node上检查配置是否正确:
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_users
>USERS OK - 1 users currently logged in |users=1;5;10;0
4.4 Computer node
同Controllernode
5. 使用Nagios实现Openstack监控
服务启动方法:
NDOUtils:/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
Nagioscore: /usr/local/nagios/bin/nagios -d/usr/local/nagios/etc/nagios.cfg
NRPE:service xinetd start
主要监控以下资源:
-
控制和计算节点硬件资源:
CPU,Mem,Disk,Network
2. 控制和计算服务
keystone,glance-api, glance-register, nova-api, nova-computer, nova-network,nova-scheduler, nova-volume, nova-objectstores, mysql, dnsmasq,rabbitmq, etc.
5.1控制和计算节点硬件资源
5.1.1 CPU
插件名称:check_cpu.sh
http://exchange.nagios.org/directory/Plugins/System-Metrics/CPU-Usage-and-Load/check_cpu-2Esh-%28matejunkie%29/details
插件描述:基于/proc/stat周期性获取CPU的监控数据,并返回W|C
插件参数:
check_cpu.sh[-i/--interval] [-w/--warning] [-c/--critical]
Options:
--interval|-i)
Definesthe pause between the two times /proc/stat is being
parsed.Higher values could lead to more accurate result.
Defaultis: 1 second
--warning|-w)
Setsa warning level for CPU user. Default is: off
--critical|-c)
Setsa critical level for CPU user. Default is: off
Example:
【本地环境】
$/usr/local/nagios/libexec/check_cpu.sh -i 3 -w 60 -c 80
> OK- user: 0.83, nice: 0.50, sys: 0.83, iowait: 0.50, irq: 0.50,softirq: 0.50 idle: 99.83, cpu_usage=3 | 'user'=0.83 'nice'=0.50'sys'=0.83 'softirq'=0.50 'iowait'=0.50 'irq'=0.50 'idle'=99.83
其中cpu_usage是当前CPU的使用量。
PS:由于此plugin是shell脚本,具体逻辑可以定制化。
【远程环境】
-
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh-i 3 -w 60 -c 80 -
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_cpu
> 同上
5.1.2 Mem
插件名称:check_mem.sh
http://exchange.nagios.org/directory/Plugins/System-Metrics/Memory/check_mem-2Esh/details
插件描述:基于free查询mem的使用情况
插件参数:
check_mem.sh-w <warnlevel> -c <critlevel>
其中warn或者crit与(memuserd/memtotal)比较
Example:
【本地环境】
$/usr/local/nagios/check_mem.sh -w 4 -c 10
>Memory: WARNING Total: 2003 MB - Used: 166 MB - 8%used!|TOTAL=2101026816;;;; USED=173740032;;;; CACHE=856936448;;;;BUFFER=58998784;;;;
PS:由于此plugin是shell脚本,具体逻辑可以定制化。
【远程环境】
-
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh-w 80 -c 90 -
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_mem
> 同上
5.1.3 Network
插件名称:check_net.pl
http://exchange.nagios.org/directory/Plugins/System-Metrics/Networking/stat_net-2Epl/details
插件描述:具体不详
插件参数:具体不详无-h | --help
Example:
【本地环境】
$/usr/local/nagios/check_net.pl
> NETOK - (Rx/Tx) eth0=(65.1B/7.1B), lo=(5.6B/5.6B)|eth0_in=68215167c;eth0_out=7394459c; lo_in=5905765c; lo_out=5905765c;
【远程环境】
-
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_net]=/usr/local/nagios/libexec/check_net.pl -
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_net
> 同上
5.1.4 Disk & LVM
插件名称:check_diskstat.sh
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/Check-IO-stats-of-one-or-all-disks/details
插件描述:具体不详
插件参数:
Usage:
./check_diskstat.sh-d DEVICE -w tps,read,write -c tps,read,write | -h
-dDEVICE DEVICE must be without /dev (ex: -d sda)
-w/cTPS,READ,WRITE TPS means transfer per seconds (aka IO/s)
READ andWRITE are in sectors per seconds
Example:
【本地环境】
$ sudo/usr/local/nagios/check_diskstat.sh -d vda -w 200,100000,100000 -c300,200000,200000
>summary: 0 io/s, read 8 sectors (0kB/s), write 56 sectors (4kB/s) in6 seconds | tps=0io/s;;; read=682b/s;;; write=4778b/s;;;
【远程环境】
-
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_diskstat]=/usr/local/nagios/libexec/check_diskstat.sh -d vda -w 200,100000,100000 -c 300,200000,200000 -
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ diskstat
> 同上
---------------------------------------------------
插件名称:check_disk
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_disk--2D-%25-used-space/details
插件描述:基于df命令编写,-d需要设置df打印出来的Mountedon
插件参数:
Thisplugin shows the % of used space of a mounted partition, using the'df' utility
./check_disk:
-c<integer> If the % of used space is above <integer>,returns CRITICAL state
-w<integer> If the % of used space is below CRITICAL and above<integer>, returns WARNING state
-d<device> The partition or mountpoint to be checked. eg./dev/sda1, /home, /
Example:
【本地环境】
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 9.9G 1.7G 7.8G 18% /
udev 998M 12K 998M 1% /dev
tmpfs 401M 224K 401M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 1002M 0 1002M 0% /run/shm
/dev/vdb 20G 173M 19G 1% /mnt
$/usr/local/nagios/check_disk -d /mnt -c 80 -w 10
> OK- /mnt space used=1% | '/mnt usage'=1%;10;80;
【远程环境】
-
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_disk]=/usr/local/nagios/libexec/check_disk-d /mnt -c 80 -w 10 -
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ disk
> 同上
---------------------------------------------------
插件名称:check_lvm
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_lvm/details
插件描述:仅运行在存在vg的情况下
插件参数:
NOTE -This script only works on _mounted_ volumes!
Usage:./check_lvm -w -c
Description:
Thisplugin finds all LVM logical volumes, checks their used space, andcompares against the supplied thresholds.
Example:
5.2 控制和计算服务
插件名称:check_proc
插件描述:基于ps,可用于查看相关服务的进程是否存在。
插件参数:
check_procs-w <range> -c <range> [-m metric] [-s state] [-p ppid]
[-uuser] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
[-Ccommand] [-t timeout] [-v]
Options:
-h,--help
Printdetailed help screen
-V,--version
Printversion information
-w,--warning=RANGE
Generatewarning state if metric is outside this range
-c,--critical=RANGE
Generatecritical state if metric is outside this range
-m,--metric=TYPE
Checkthresholds against metric. Valid types:
PROCS - number of processes (default)
VSZ - virtual memory size
RSS - resident set memory size
CPU - percentage CPU
ELAPSED- time elapsed in seconds
-t,--timeout=INTEGER
Secondsbefore connection times out (default: 10)
-v,--verbose
Extrainformation. Up to 3 verbosity levels
Filters:
-s,--state=STATUSFLAGS
Onlyscan for processes that have, in the output of `ps`, one or
moreof the status flags you specify (for example R, Z, S, RS,
RSZDT,plus others based on the output of your 'ps' command).
-p,--ppid=PPID
Onlyscan for children of the parent process ID indicated.
-z,--vsz=VSZ
Onlyscan for processes with VSZ higher than indicated.
-r,--rss=RSS
Onlyscan for processes with RSS higher than indicated.
-P,--pcpu=PCPU
Onlyscan for processes with PCPU higher than indicated.
-u,--user=USER
Onlyscan for processes with user name or ID indicated.
-a,--argument-array=STRING
Onlyscan for processes with args that contain STRING.
--ereg-argument-array=STRING
Onlyscan for processes with args that contain the regex STRING.
-C,--command=COMMAND
Onlyscan for exact matches of COMMAND (without path).
Example:
$/usr/local/nagios/check_procs -w 3 -c 5 -a nagios
>PROCS OK: 2 processes with args 'nagios'
5.3 其它可选监控插件
[LOG]
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_log-2Esh/details
http://exchange.nagios.org/directory/Plugins/Log-Files
[DNS]
http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_dig/details
[DHCP]
http://exchange.nagios.org/directory/Plugins/Network-Protocols/DHCP-and-BOOTP
[AMQP]
http://exchange.nagios.org/directory/Plugins/Software/check_rabbitmq/details
[MYSQL]
http://exchange.nagios.org/directory/Plugins/Databases/MySQL
[ROUTE]
http://exchange.nagios.org/directory/Plugins/Network-Protocols/%2A-Routing
-
备注
Nagios本身具有web界面,web界面通过与Nagioscore的进程交互获取信息,而Nagioscore通过plugin获取信息,并将数据保存在mysql数据库中。
由于在目前环境下仅需基于Nagios的plugin获取节点的监控信息,所以并未在Nagioscore,NDOUtils,Nagiosweb interface进行深入描述。具体详细信息科参考Refernces。