Nagios 被动模式

Wednesday, January 27, 2010

背景:公司有台内网的机器需要通过外网的Nagios来监控,并发送报警信息。

前提:内网部署的Nagios主机可以连外网。

方案:在内网部署Nagios服务器,并通过NSCA Client端发送消息到外网Nagios主机。

步骤:
1、在外网的Nagios服务器上安装NSCA服务端,并开启NSCA服务端,监听端口为5667
2、配置nagios.cfg文件修改
accept_passive_service_checks=1
accept_passive_host_checks=1
3、配置commond.cfg添加
# 'check_dummy' command definition
define command{
    command_name check_dummy
    command_line $USER1$/check_dummy $ARG1$
}
4、定义被动模式的服务器模板
define host {
    name    passive_host
    check_period    24x7
    check_command   check_dummy!2
    contact_groups  nagiosadmin
    notification_period 24x7
    check_interval  5
    retry_interval  1
    max_check_attempts  10
    active_checks_enabled   0
    passive_checks_enabled  1
    obsess_over_host    1
    event_handler_enabled   1
    low_flap_threshold  0.000000
    high_flap_threshold 0.000000
    flap_detection_enabled  1
    flap_detection_options  o,d,u
    freshness_threshold 0
    check_freshness 0
    notification_options    d,u,r
    notifications_enabled   1
    notification_interval   30
    stalking_options    n
    process_perf_data   1
    failure_prediction_enabled  1
    retain_status_information   1
    retain_nonstatus_information    1
check_freshness 1
freshness_threshold 600
    register    0
    }
5、定义被动模式的服务模板
define service {
    name   passive_service
    check_period    24x7
    check_command   check_dummy!2
    contact_groups  nagiosadmin
    notification_period 24x7
    check_interval  10
    retry_interval  2
    max_check_attempts  3
    parallelize_check   1
    active_checks_enabled   0
    passive_checks_enabled  1
    obsess_over_service 1
    event_handler_enabled   1
    low_flap_threshold  0.000000
    high_flap_threshold 0.000000
    flap_detection_enabled  1
    flap_detection_options  o,w,u,c
    freshness_threshold 0
    check_freshness 0
    notification_options    u,w,c,r
    notifications_enabled   1
    notification_interval   10
    stalking_options    n
    process_perf_data   1
    failure_prediction_enabled  1
    retain_status_information   1
    retain_nonstatus_information    1
check_freshness 1
freshness_threshold 600
    register    0
    }
注:
# 服务对象定义里的check_freshness选项设为1,这将打开针对该服务的"刷新检测"特性;
# 服务对象定义里的freshness_threshold选项须设定为一个以秒为单位的数值,该值反应出由分布式服务器所提供的检测数据将应该以什么样频度来提供出来,一般是分布式服务器normal——check_interval的2倍;

6、配置内网的nagios服务器只是发消息到外网nagios服务器上,所以并不需要apache而且只需要编译一下nsca的安装包,并把src目录下的send_nsca复制到nagios plugins的目录,sample-config目录下的send_nsca.cfg复制到nagios的etc目录下
7、配置内网nagios的nagios.cfg修改
obsess_over_services=1
ocsp_command=submit_check_result
obsess_over_hosts=1
ochp_command=submit_host_alive
enable_notifications=0
8、配置commond.cfg添加
# 'submit_check_result' command definition
define command{
        command_name submit_check_result
        command_line $USER1$/submit_check_result.sh $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$' '$SERVICEPERFDATA$'
}
# 'submit_host_alive' command definition
define command{
        command_name submit_host_alive
        command_line $USER1$/submit_host_alive.sh $HOSTNAME$ $HOSTSTATE$ '$HOSTOUTPUT$' '$HOSTPERFDATA$'
}
9、建立$USER1$/submit_check_result.sh并修改权限为755,修改执行用户和组为nagios
#!/bin/sh

# Arguments:
# $1 = host_name (Short name of host that the service is
# associated with)
# $2 = svc_description (Description of the service)
# $3 = state_string (A string representing the status of
# the given service - "OK", "WARNING", "CRITICAL"
# or "UNKNOWN")
# $4 = plugin_output (A text string that should be used
# as the plugin output for the service checks)
# $5 = perdata

# Convert the state string to the corresponding return code
return_code=-1

case "$3" in
OK)
return_code=0
;;
WARNING)
return_code=1
;;
CRITICAL)
return_code=2
;;
UNKNOWN)
return_code=-1
;;
esac

# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server

/usr/bin/printf "%b" "$1\t$2\t$return_code\t$4|$5 \n" | /usr/local/nagios/libexec/send_nsca 外网nagios地址 -c /usr/local/nagios/etc/send_nsca.cfg
10、建立submit_host_alive.sh并修改权限为755,修改执行用户和组为nagios
#!/bin/sh

# Arguments:
# $1 = host_name (Short name of host)
# $2 = state_string (A string representing the status of
# the given service - "UP", "DOWN", "UNREACHABLE")
# $3 = plugin_output (A text string that should be used
# as the plugin output for the service checks)
# $4 = perdata

# Convert the state string to the corresponding return code
return_code=-1

case "$2" in
UP)
return_code=0
;;
DOWN)
return_code=1
;;
UNREACHABLE)
return_code=2
;;
esac

# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server

/usr/bin/printf "%b" "$1\t$return_code\t$3|$4 \n" | /usr/local/nagios/libexec/send_nsca 外网nagios地址 -c /usr/local/nagios/etc/send_nsca.cfg


后续:外网nagios服务器上和内网nagios服务器定义相同的服务器和服务即可,内网的使用主动模式检测服务器或服务,所有服务消息会通过ocsp_command=submit_check_result,所有的服务器消息会通过ochp_command=submit_host_alive提交到外网nagios服务器,在外网nagios服务器上可以通过查看message日志得到提交的消息日志,前提是nsca.cfg的debug日志开启。

通过这种部署,可以把多台nagios主机的消息汇总到一台上来展现。

Posted by Michael.Ding at 4:04 PM

0 comments:

Post a Comment