Resource-usage report showed ~29 cores of average load coming from i3blocks helper scripts forking awk/tr/grep/bc/sensors/nvidia-smi every tick. Rewrite all five hot-path scripts to eliminate forks: - volume.sh: persist mode, blocks on 'pactl subscribe' event stream. No polling, no sleep, no fork per tick. - gpu_monitor.sh: persist mode, single long-lived 'nvidia-smi --loop=5' feeds a bash 'while read' loop. Falls back to /sys for amdgpu. - battery_status.sh: reads /sys/class/power_supply/BAT*/ directly. Zero forks; replaces 'acpi | awk' pipeline. - cpu_monitor.sh: reads /proc/loadavg and k10temp/coretemp /sys/class/hwmon. Zero forks; replaces 'sensors | awk | tr' + bc arithmetic. - motherboard_temp.sh: reads nct*/it*/f71* Super-I/O hwmon node directly. Zero forks. Configure volume + gpu_monitor with interval=persist so i3blocks keeps one long-lived producer each instead of forking per tick. Also add: - kill_stale_recorders.sh -- kill stray ffmpeg x11grab / dotnet-trace / dotnet-monitor processes left running after sessions. - monitors.slice -- resource-capped user slice (CPUQuota=50%, MemoryMax=512M, MemorySwapMax=0 for zram safety, TasksMax=256) to bound future monitoring regressions. - efficient-polling-scripts SKILL -- rules for writing status-bar and polling scripts without forks; fork-pipeline to bash-builtin translation table; verification checklist. Verified live: strace -c on cpu_monitor.sh shows 1 execve / 0 clones; persist producers (pactl subscribe, nvidia-smi --loop) show 0 CPU ticks over a 3s idle sample. Per-invocation timing 1.6-1.9 ms (was 30-80 ms).
8.8 KiB
| name | description |
|---|---|
| efficient-polling-scripts | Use BEFORE writing any shell or Python script that runs on a timer, per-tick status bar (i3blocks/waybar/polybar), cron-like loop, or any repeated invocation. Prevents fork-storm anti-patterns that can consume many CPU-hours per day from tiny polling scripts. |
Efficient Polling & Status-Bar Scripts
When this applies
Any script that runs frequently — per second or per few seconds — especially:
- i3blocks / waybar / polybar / xmobar / tmux status-line scripts
- cron / systemd-timer jobs with intervals < 1 min
- watcher loops invoked by another process every tick
- Python CLIs invoked from a shell hot loop
A single fork pipeline running once per second will consume ~30–50 CPU-minutes per day per forked helper. Five such scripts with 3–8 helpers each turn into days of CPU-time lost per day and tens of thousands of forked processes showing up in atop.
The rules
R1. Zero forks in the hot path when possible
Every $(...), backtick, and | in a shell script forks a process. Favor bash builtins:
| Instead of | Use |
|---|---|
$(cat /proc/loadavg) |
$(</proc/loadavg) or read -r one _ < /proc/loadavg |
echo "$x" | awk '{print $1}' |
read -r first _ <<< "$x" or arr=($x); first=${arr[0]} |
echo "$x" | tr -d '%' |
${x//%/} |
echo "$x" | grep -Po '\d+%' |
[[ $x =~ ([0-9]+)% ]] && vol=${BASH_REMATCH[1]} |
echo "$a < $b" | bc -l |
(( a_times_100 < b_times_100 )) (scale decimals to ints) |
sensors | awk ... |
read -r milli < /sys/class/hwmon/hwmonN/temp1_input |
acpi -b | awk ... |
read -r cap < /sys/class/power_supply/BAT0/capacity |
free -h | awk ... |
parse /proc/meminfo with while read -r |
df -h / | awk ... |
stat -f builtin? No: use a long-lived reader, or accept one fork at low frequency |
lspci | grep -i nvidia |
check /sys/bus/pci/devices/*/vendor (0x10de == NVIDIA) |
R2. Read from /sys and /proc directly
The kernel exposes structured data without forking anything. Useful paths:
- CPU load:
/proc/loadavg - CPU per-core stat:
/proc/stat - Memory:
/proc/meminfo - Temps / fans / voltages:
/sys/class/hwmon/hwmon*/- CPU on AMD:
name=k10temp,temp1_input= Tctl (milli-°C, divide by 1000) - CPU on Intel:
name=coretemp - Motherboard Super-I/O:
name=nct*/it87*/f71* - AMD GPU:
name=amdgpu, plus/sys/class/drm/card*/device/gpu_busy_percent
- CPU on AMD:
- Battery:
/sys/class/power_supply/BAT*/(capacity,status,energy_now,power_now) - Backlight:
/sys/class/backlight/*/brightness - Network link:
/sys/class/net/*/operstate,/sys/class/net/*/statistics/*_bytes
NVIDIA is the unfortunate exception — there is no sysfs utilization interface, so nvidia-smi is required. Mitigate with R4 (long-lived producer).
R3. Integer arithmetic, never bc in a hot loop
bc forks a process. For decimal comparisons, multiply out:
# "1.23" → 123, "0.45" → 45; compare against threshold ×100.
load_x100=$((10#${one//./}))
(( load_x100 < 150 )) && echo 'normal'
Bash's ((…)) and [[ … ]] are builtins — free.
R4. Prefer event-driven / long-lived producers over polling + sleep
When an update needs to happen often, replace "poll + sleep + exit" with one of:
- i3blocks
interval=persist: script runs forever, prints one block per update. Block on an event stream withread— no sleep, no busy-wait. pactl subscribe: event stream for PulseAudio/PipeWire volume/mute changes.udevadm monitor: hardware / power-supply / backlight events.inotifywait -m: file/dir changes.dbus-monitor: session-wide events (network, media keys, NetworkManager).journalctl -f: new log lines.nvidia-smi --loop=N/nvidia-smi dmon -d N: one long-lived nvidia-smi emitting rows instead of forking every N seconds. Tail its stdout withwhile read.mpstat N,iostat N,vmstat N: same pattern for CPU/IO.
Canonical persist skeleton:
#!/bin/bash
set -u
emit() { printf '%s\n' "$1"; }
emit "$(initial_value)"
producer_command | while read -r line; do
# `read` blocks on I/O — no CPU, no sleep, no poll.
[[ $line matches relevant event ]] || continue
emit "$(compute_new_value)"
done
R5. One-shot scripts must still be cheap
Even with interval=5, 1728 invocations/day × 3 forks = 5k forks/day. Make the single-invocation path fork-free when possible. Profile with:
strace -f -e trace=%process -c ./myscript.sh
The clone / execve counts are your fork count.
R6. Python called from a hot loop is an anti-pattern
CPython startup is ~50–80 ms on modern hardware. Invoking python my_helper.py once per second = ~5–8% of one core doing nothing but importing stdlib.
If a status-bar value needs Python logic:
- Inline it in bash when possible (the rules above almost always suffice).
- Run a persistent Python daemon that writes to a FIFO / Unix socket / tmpfile; the bash hot-path reads from it with
read/$(<file). - Use a compiled helper (Go/Rust/C) if Python startup is the only issue — a static binary startup is sub-millisecond.
R7. Cap risk with a systemd slice
Even a correct script can regress. Put status-bar / monitoring work in a resource-capped user slice so the blast radius is bounded:
# ~/.config/systemd/user/monitors.slice
[Slice]
CPUQuota=50%
MemoryMax=512M
MemorySwapMax=0 # REQUIRED on zram systems — see oom-prevention skill
TasksMax=256
Launch i3blocks (or individual persist scripts) under that slice, e.g. via a user service with Slice=monitors.slice, so every child inherits the cap.
R8. Measure before and after
For any "fast" shell script, time 10k invocations:
time for _ in {1..10000}; do ./script.sh >/dev/null; done
Target: a 1-Hz script should take < 2 ms per invocation on a modern desktop.
A 5-second-interval script can afford ~20 ms.
If you're over budget, count the execve with strace -c and remove forks.
Python-specific rules (for daemons, not hot-loop callees)
- Use
pathlib.Path.read_text()/read_bytes()— one syscall, no subprocess. - Open
/sys//procfiles with the builtinopen(); they're tiny reads. - For event loops, use
asyncio/selectorsto block on fds (same idea asreadin bash) instead oftime.sleep()in a polling loop. - Don't shell out with
subprocess.run("sensors")when/sys/class/hwmonexists. - Cache
psutilobjects across ticks —psutil.cpu_percent(interval=None)uses deltas and is O(1) after the first call.
Common red flags (search for these in review)
while true/while :with asleepand no event source$(…|…|…)chains with three or more pipes in a status-bar script| awk,| grep,| tr,| cut,| sed,| head,| tailwhere bash builtins would do$(cat foo)anywhere — always replaceable with$(<foo)echo … | bc— replaceable with bash integer mathsensors,acpi,free,lspci,iwgetidin a per-second scriptpython …/node …invoked per tick- No
set -u(silent typo bugs compound over thousands of ticks)
Verification checklist before shipping
shellcheck script.sh— clean.strace -c -f script.sh 2>&1 | grep -E 'execve|clone'— fork count matches expectation.time for _ in {1..10000}; do script.sh >/dev/null; done— under budget.- For persist scripts: run for 60 s under
perf stat -p $PID— CPU time near zero when idle. - Running under the
monitors.sliceunit — verify withsystemctl --user status monitors.slice.
Reference implementations in this repo
linux_configuration/i3-configuration/i3blocks/volume.sh— persist mode withpactl subscribe.linux_configuration/i3-configuration/i3blocks/gpu_monitor.sh— persist mode withnvidia-smi --loop.linux_configuration/i3-configuration/i3blocks/battery_status.sh— zero-fork via/sys/class/power_supply.linux_configuration/i3-configuration/i3blocks/cpu_monitor.sh— zero-fork via/proc/loadavg+/sys/class/hwmon.linux_configuration/i3-configuration/i3blocks/motherboard_temp.sh— zero-fork via/sys/class/hwmon.linux_configuration/scripts/system-maintenance/systemd/monitors.slice— resource-cap slice.