linux watchdog
基于debian8 的linux设备会低概率的出现的系统完整死机,这里思考给linux添加完整的看门狗策略。
debian8已经采用systemd用以初始化系统和守护、管理系统进程。这里同时存在systemd 的watchdog和keepalive 单元文件,以及sysv init的watchdog keepavlie 初始化脚本,同时systemd也直接看门狗启动,那么该如何选择呢?
systemd直接支持看门狗启动;
pi@350-tf-s2 ~$ cat /etc/systemd/system.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # See systemd-system.conf(5) for details #RuntimeWatchdogSec=0 #ShutdownWatchdogSec=10min
debain8同时支持systemd和sysv init的看门狗以及保活机制。
/lib/systemd/system/watchdog.service /lib/systemd/system/wd_keepalive.service /etc/init.d/watchdog /etc/init.d/wd_keepalive
kernel driver
Device Drivers
->Watchdog Timer Support
./drivers/watchdog/Kconfig
menuconfig WATCHDOG bool "Watchdog Timer Support" ---help--- 用以控制看门狗配置集合使能 config WATCHDOG_CORE bool "WatchDog Timer Driver Core" ---help--- 看门狗核心功能使能,如果Y,会创建/dev/watchdog 设备。 config WATCHDOG_NOWAYOUT bool "Disable watchdog shutdown on close" ---help--- 默认看门狗行为是一旦关闭/dev/wathdog 文件句柄则停止看门狗计时,也就是不再触发复位, 但是使能该选项后意关闭看门狗后不能停止计时,也就意味着开启看门狗后不能再停止。 config SOFT_WATCHDOG 软件看门狗
所以这里只需要使能
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
sunxi watchdog
如上使能了过后还是没有出现看门狗设备,参考sunxi 主核发布说明确定4.17之后a64才加入看门狗功能,这里涉及sunxi_wdt和dts和驱动移植。
error
drivers/watchdog/sunxi_wdt.c:206:2: error: unknown field 'restart' specified in initializer .restart = sunxi_wdt_restart, ^ drivers/watchdog/sunxi_wdt.c:206:13: warning: initialization from incompatible pointer type .restart = sunxi_wdt_restart, ^ drivers/watchdog/sunxi_wdt.c:206:13: warning: (near initialization for 'sunxi_wdt_ops.get_timeleft') drivers/watchdog/sunxi_wdt.c: In function 'sunxi_wdt_probe': drivers/watchdog/sunxi_wdt.c:244:2: error: implicit declaration of function 'of_device_get_ match_data' [-Werror=implicit-function-declaration] sunxi_wdt->wdt_regs = of_device_get_match_data(&pdev->dev); ^ drivers/watchdog/sunxi_wdt.c:244:22: warning: assignment makes pointer from integer without a cast sunxi_wdt->wdt_regs = of_device_get_match_data(&pdev->dev); ^ drivers/watchdog/sunxi_wdt.c:262:2: error: implicit declaration of function 'watchdog_set_restart_priority' [-Werror=implicit-function-declaration] watchdog_set_restart_priority(&sunxi_wdt->wdt_dev, 128); ^ drivers/watchdog/sunxi_wdt.c:268:2: error: implicit declaration of function 'watchdog_stop_on_reboot' [-Werror=implicit-function-declaration] watchdog_stop_on_reboot(&sunxi_wdt->wdt_dev); ^ drivers/watchdog/sunxi_wdt.c:269:2: error: implicit declaration of function 'devm_watchdog_register_device' [-Werror=implicit-function-declaration] err = devm_watchdog_register_device(&pdev->dev, &sunxi_wdt->wdt_dev);
启动日志
sunxi-wdt 1c20ca0.watchdog: Watchdog enabled (timeout=16 sec, nowayout=0)
force fsck
# !/bin/bash
# focrce file system check with ramdisk init .
[ -d /temp ] || {
sudo mkdir /temp
}
if [ "$1" == "remove" ]; then
sudo mount /dev/mmcblk0p2 /temp
sudo rm /temp/force_fsck
sudo umount /temp
sync
echo "remove succeed"
else
sudo mount /dev/mmcblk0p2 /temp
sudo touch /temp/force_fsck
sync
sudo umount /temp
echo "reboot,please."
fi
troubleshooting
/bin/systemd-tty-ask-password-agent --watch
ExecStartPre=/bin/sh -c '[ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module'
11 ExecStart=/bin/sh -c '[ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options'
sudo systemd-analyze set-log-level debug
sudo systemctl start watchdog.service
sudo systemd-analyze set-log-level debug
sudo journalctl -b > /tmp/journal.txt
还不快抢沙发