今天上午,Nagios监控平台突然报警FS(文件服务器)出现好多服务错误,Status Information是“CRITICAL - Socket timeout after 10 seconds” 。
由于此服务器比较重要,马上检查了报警中的各服务,都正常,业务也正常。但该警告一直在增加,且服务名在变化。网上搜了一下该报错信息,原来是这样的。
官网的信息:
The check_nrpe plugin returns “CHECK_NRPE: Socket timeout after 10 seconds”
The command that the NRPE daemon was asked to run took longer than 10 seconds to execute. This is the
most likely cause if the error message was “CHECK_NRPE: Socket timeout after 10 seconds”. Use the -t
command line option to specify a longer timeout for the check_nrpe plugin. The following example will
increase the timeout to 30 seconds:
大概的意思是说:
nrpe 进程执行某些脚本可能是大于10秒钟,而默认的是10秒钟。所以会发报警信息,解决的方法如下: