CPU占用过高排查

大概率原因

多线程的程序可能因为调度不当、资源管理不当、同步异步处理不当等原因，导致线程“跑飞了”，程序占用CPU过高。往往是陷入如下的循环没及时跳出导致。

例子：

while (true) {
    // do work
    //......
}

若"do work"的内容执行时间极短，或者在do work中触发了条件直接continue到下一次循环了，那么，就容易造成CPU占用过高。

解决方法：

根据条件及时break跳出无限循环
避免频繁continue
增加合理sleep

但如何定位到发生这种“跑飞了”的代码所在是个问题。以下是问题定位方法。

查看占用过高的进程

top命令进入，按P（大写）以CPU占用排序，找到占用过高的进程pid。
top -Hp <pid>命令查看进程的各个线程资源占用。

pstack脚本打印调用栈

用pstack打印调用栈信息，格式：

./pstack.sh <pid>

（ps. 需要装好gdb，sudo apt install gdb）

其中<pid>若为进程pid，则会打印出其创建的各线程的调用栈；若为线程pid，则打印出该线程的调用栈。

pstack.sh脚本如下：

#!/bin/sh

if test $# -ne 1; then
    echo "Usage: `basename $0 .sh` <process-id>" 1>&2
    exit 1
fi

if test ! -r /proc/$1; then
    echo "Process $1 not found." 1>&2
    exit 1
fi

# GDB doesn't allow "thread apply all bt" when the process isn't
# threaded; need to peek at the process to determine if that or the
# simpler "bt" should be used.

backtrace="bt"
if test -d /proc/$1/task ; then
    # Newer kernel; has a task/ directory.
    if test `/bin/ls /proc/$1/task | /usr/bin/wc -l` -gt 1 2>/dev/null ; then
    backtrace="thread apply all bt"
    fi
elif test -f /proc/$1/maps ; then
    # Older kernel; go by it loading libpthread.
    if /bin/grep -e libpthread /proc/$1/maps > /dev/null 2>&1 ; then
    backtrace="thread apply all bt"
    fi
fi

GDB=${GDB:-/usr/bin/gdb}

# Run GDB, strip out unwanted noise.
# --readnever is no longer used since .gdb_index is now in use.
$GDB --quiet -nx $GDBARGS /proc/$1/exe $1 <<EOF 2>&1 |
set width 0
set height 0
set pagination no
$backtrace
EOF
/bin/sed -n \
    -e 's/^\((gdb) \)*//' \
    -e '/^#/p' \
    -e '/^Thread/p'

用以上命令查看到的线程调用栈，多查看几次。根据经验，每次查看基本上都能够看到的函数，并且CPU占用还特别高，往往是陷入“死循环”的地方，可以查看该处代码进行进一步排查。

菜单

分享

CPU占用过高排查

大概率原因

查看占用过高的进程

pstack脚本打印调用栈

评论

记一次linux上的僵死进程排查

Linux中sysrq事件

ChatGPT发送图片的方法

正则匹配小记

linux防火墙UFW

aria2下载工具

ffmeg使用手册（完善中）

Alist-网盘汇聚神器

CPU占用过高排查

taskset命令