linux 看门狗

driver,linux,a64,debian 2019-09-17 32 次浏览 次点赞


linux watchdog

基于debian8 的linux设备会低概率的出现的系统完整死机,这里思考给linux添加完整的看门狗策略。

debian8已经采用systemd用以初始化系统和守护、管理系统进程。这里同时存在systemd 的watchdog和keepalive 单元文件,以及sysv init的watchdog keepavlie 初始化脚本,同时systemd也直接看门狗启动,那么该如何选择呢?

  • systemd直接支持看门狗启动;

    pi@350-tf-s2 ~$ cat /etc/systemd/system.conf 
    #  This file is part of systemd.
    #
    #  systemd is free software; you can redistribute it and/or modify it
    #  under the terms of the GNU Lesser General Public License as published by
    #  the Free Software Foundation; either version 2.1 of the License, or
    #  (at your option) any later version.
    #
    # See systemd-system.conf(5) for details
    #RuntimeWatchdogSec=0
    #ShutdownWatchdogSec=10min
  • debain8同时支持systemd和sysv init的看门狗以及保活机制。

    /lib/systemd/system/watchdog.service
    /lib/systemd/system/wd_keepalive.service
    /etc/init.d/watchdog
    /etc/init.d/wd_keepalive

I. kernel driver

  • Device Drivers-> Watchdog Timer Support

kernel menconfig settings

  • ./drivers/watchdog/Kconfig

    menuconfig WATCHDOG
             bool "Watchdog Timer Support"
             ---help---
             用以控制看门狗配置集合使能
    
     config WATCHDOG_CORE
             bool "WatchDog Timer Driver Core"
             ---help---
             看门狗核心功能使能,如果Y,会创建/dev/watchdog 设备。
     
     config WATCHDOG_NOWAYOUT
             bool "Disable watchdog shutdown on close"
             ---help---
             默认看门狗行为是一旦关闭/dev/wathdog 文件句柄则停止看门狗计时,也就是不再触发复位,          但是使能该选项后意关闭看门狗后不能停止计时,也就意味着开启看门狗后不能再停止。
     
     config SOFT_WATCHDOG
             软件看门狗

    所以这里只需要使能CONFIG_WATCHDOG=y CONFIG_WATCHDOG_CORE=y

II. sunxi watchdog

如上使能了过后还是没有出现看门狗设备,参考sunxi 主核发布说明确定4.17之后a64才加入看门狗功能,这里涉及sunxi_wdt和dts和驱动移植。

  • error

    drivers/watchdog/sunxi_wdt.c:206:2: error: unknown field 'restart' specified in initializer
      .restart = sunxi_wdt_restart,
      ^
    drivers/watchdog/sunxi_wdt.c:206:13: warning: initialization from incompatible pointer type
      .restart = sunxi_wdt_restart,
                 ^
    drivers/watchdog/sunxi_wdt.c:206:13: warning: (near initialization for 'sunxi_wdt_ops.get_timeleft')
    drivers/watchdog/sunxi_wdt.c: In function 'sunxi_wdt_probe':
    drivers/watchdog/sunxi_wdt.c:244:2: error: implicit declaration of function 'of_device_get_
    
    match_data' [-Werror=implicit-function-declaration]
      sunxi_wdt->wdt_regs = of_device_get_match_data(&pdev->dev);
      ^
    drivers/watchdog/sunxi_wdt.c:244:22: warning: assignment makes pointer from integer without a cast
      sunxi_wdt->wdt_regs = of_device_get_match_data(&pdev->dev);
                          ^
    drivers/watchdog/sunxi_wdt.c:262:2: error: implicit declaration of function 'watchdog_set_restart_priority' [-Werror=implicit-function-declaration]
      watchdog_set_restart_priority(&sunxi_wdt->wdt_dev, 128);
      ^
    drivers/watchdog/sunxi_wdt.c:268:2: error: implicit declaration of function 'watchdog_stop_on_reboot' [-Werror=implicit-function-declaration]
      watchdog_stop_on_reboot(&sunxi_wdt->wdt_dev);
      ^
    drivers/watchdog/sunxi_wdt.c:269:2: error: implicit declaration of function 'devm_watchdog_register_device' [-Werror=implicit-function-declaration]
      err = devm_watchdog_register_device(&pdev->dev, &sunxi_wdt->wdt_dev);
  • 启动日志

    sunxi-wdt 1c20ca0.watchdog: Watchdog enabled (timeout=16 sec, nowayout=0)

III. force fsck

# !/bin/bash
# focrce file system check with ramdisk init .
[ -d /temp ] || {
    sudo mkdir /temp    
}
if [ "$1" == "remove" ]; then
    sudo mount /dev/mmcblk0p2 /temp
    sudo rm /temp/force_fsck
    sudo umount /temp
    sync
    echo "remove succeed"
else
    sudo mount /dev/mmcblk0p2 /temp
    sudo touch /temp/force_fsck
    sync
    sudo umount /temp
    echo "reboot,please."
fi

IV. troubleshooting

/bin/systemd-tty-ask-password-agent --watch

ExecStartPre=/bin/sh -c '[ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module'                                                                    
   11 ExecStart=/bin/sh -c '[ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options'
sudo systemd-analyze set-log-level debug
sudo systemctl start watchdog.service
sudo systemd-analyze set-log-level debug
sudo journalctl -b > /tmp/journal.txt


本文由 lijie 创作,采用 知识共享署名 3.0,可自由转载、引用,但需署名作者且注明文章出处,点赞0

还不快抢沙发

添加新评论