Linux 下使用 Monit 实现服务挂掉自动拉起

背景

由于应用稳定性或者服务器资源限制等问题,应用就会出现自动挂掉的情况,此时就需要自动拉起应用。

生产环境,为了防止因为意外宕机造成服务长时间中断,一般都会设置服务进程监控拉起机制。

简介

Monit – utility for monitoring services on a Unix system

Monit 是 Unix 系统上的服务监控工具。可以用来监控和管理进程、程序、文件、目录和设备等。

优点

  • 安装配置简单,超轻量
  • 可以监控前后台进程(Supervisor 无法监控后台启动进程)
  • 除了监控进程还可以监控文件,还可以监控系统资源(CPU,内存,磁盘)使用率
  • 可以设置进程依赖,控制启动顺序
  • 缺点

  • Monit 采用间隔轮询的方式检测,决定了它达不到 Supervisor 一样的实时感知。
  • 安装

    # 安装 epel 源$ yum -y install epel-release# 安装 monit$ yum -y install monit# 验证$ monit -VThis is Monit version 5.26.0Built with ssl, with ipv6, with compression, with pam and with large filesCopyright (C) 2001-2019 Tildeslash Ltd. All Rights Reserved.# 启动服务$ systemctl start monit# 启动 monit 守护进程$ monit

    命令

    官方手册:
    https://mmonit.com/monit/documentation/monit.html

    命令格式: monit [options]+ [command]

    # 查看帮助信息$ monit -h

    命令选项

    常用命令

    配置

    yum 安装后的默认配置文件如下:
    全局参数配置文件 : /etc/monitrc
    服务监控配置文件目录:/etc/monit.d
    日志文件: /var/log/monit.log

    # 配置文件$ grep -v "^#" /etc/monitrc# 每 5 秒检查被监控服务的状态set daemon  5              # check services at 30 seconds intervalsset log syslog# 启用内置的 web 服务器set httpd port 2812 and    use address 10.0.0.2  # only accept connection from localhost (drop if you use M/Monit)    # 允许 localhost 连接    allow localhost        # allow localhost to connect to the server and    # 解决本地命令 错问题: Error receiving data -- Connection reset by peer    allow 10.0.0.2    # 运行外  IP 访问    allow x.x.x.x    # web登录的用户名和密码    allow admin:monit      # require user 'admin' with password 'monit'    #with ssl {            # enable SSL/TLS and set path to server certificate    #    pemfile: /etc/ssl/certs/monit.pem    #}# 监控服务配置文件目录include /etc/monit.d/*

    监控服务

    # 查看 nexus 监控文件$ cat /etc/monit.d/nexuscheck process nexus        matching "org.sonatype.nexus.karaf.NexusMain"        start program = "/root/nexus3/nexus-3.12.1-01/bin/nexus start"        stop program = "/root/nexus3/nexus-3.12.1-01/bin/nexus stop"        if failed port 18081 then restart# 查看 nexus 监控状态$ monit status nexusMonit 5.26.0 uptime: 3h 46mProcess 'nexus'  status                       OK  monitoring status            Monitored  monitoring mode              active  on reboot                    start  pid                          12439  parent pid                   1  uid                          0  effective uid                0  gid                          0  uptime                       9m  threads                      91  children                     0  cpu                          0.6%  cpu total                    0.6%  memory                       13.8% [1.1 GB]  memory total                 13.8% [1.1 GB]  security attribute           -  disk read                    0 B/s [253.5 MB total]  disk write                   0 B/s [95.0 MB total]  port response time           0.982 ms to localhost:18081 type TCP/IP protocol DEFAULT  data collected               Wed, 13 May 2020 14:34:39# 验证 nexus 停机自动拉起$  kill -9 8761# 查看自动拉起的 nexus 监控状态,$ monit status nexusMonit 5.26.0 uptime: 11mProcess 'nexus'  status                       OK  monitoring status            Monitored  monitoring mode              active  on reboot                    start  pid                          29188  parent pid                   1  uid                          0  effective uid                0  gid                          0  uptime                       0m  threads                      33  children                     0  cpu                          32.0%  cpu total                    32.0%  memory                       2.7% [213.4 MB]  memory total                 2.7% [213.4 MB]  security attribute           -  disk read                    0 B/s [76.6 MB total]  disk write                   0 B/s [1.2 MB total]  port response time           -  data collected               Wed, 13 May 2020 10:59:56# 查看过程日志$ tailf -20 /var/log/monit.log......[CST May 13 10:52:58] info     : 'VM_0_2_centos' Monit reloaded[CST May 13 10:52:59] info     : 'VM_0_2_centos' monitor on user request[CST May 13 10:52:59] info     : 'nexus' monitor on user request[CST May 13 10:52:59] info     : Monit daemon with PID 26616 awakened[CST May 13 10:52:59] info     : Awakened by User defined signal 1[CST May 13 10:52:59] info     : 'nexus' monitor action done[CST May 13 10:52:59] info     : 'VM_0_2_centos' monitor action done[CST May 13 10:59:39] error    : 'nexus' process is not running[CST May 13 10:59:39] info     : 'nexus' trying to restart[CST May 13 10:59:39] info     : 'nexus' start: '/root/nexus3/nexus-3.12.1-01/bin/nexus start'[CST May 13 10:59:43] info     : 'nexus' process is running with pid 29188

    web 控制台

    web 控制台地址:http://10.0.0.2:2812/

    主页面:

    监控运行信息:

    系统监控信息:

    进程监控信息:?

    声明:本站部分文章及图片源自用户投稿,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!

    上一篇 2020年4月9日
    下一篇 2020年4月9日

    相关推荐