Public issue detail

Runaway CPU investigation for sshd-session: unknown userspace loop at __kmalloc_node_track_caller_noprof

sshd-session is stuck in a likely unclassified userspace loop: 10.58% of sampled CPU passed through __kmalloc_node_track_caller_noprof, with repeated thread backtraces show 1 thread(s) around 0x00007f202289a7d2 in ?? () from target:<path>.

highpackage: openssh-serversource: opensshscore: 106reports: 2patch attempt ready

Last seen: 2026-05-14 12:24 UTC. Public JSON: /v1/issues/019e25b7-4841-7e60-a8a7-0d6908381553

Pull-request-ready diff

This is the current best public patch attempt for the issue. The downloadable .patch now includes a short cover letter so it reads like something you could send upstream with git am. If you only want the raw diff, grab the .diff instead.

best patchcreated: 2026-05-14 11:08 UTCvalidation: ready

sshd-session likely remains stuck in an unclassified userspace loop. A diagnosis report and patch proposal were created locally.

Suggested subject

channels: arm poll events for separate channel socket fds

Commit message.

`channel_prepare_pollfd()` builds the `pollfd` array used by the SSH channel event loop. For a channel whose separate `sock` descriptor wanted read or write readiness, the code computed the correct poll event mask but stored zero in `pfd.events`, so `ppoll()` was not actually asked to watch that descriptor. Store the computed mask, matching the `rfd`, `wfd`, and `efd` paths.

Problem. sshd-session is stuck in a likely unclassified userspace loop: 10.58% of sampled CPU passed through __kmalloc_node_track_caller_noprof, with repeated thread backtraces show 1 thread(s) around 0x00007f202289a7d2 in ?? () from target:<path>.

How this patch connects to the issue.

Fixer observed an `sshd-session` process with profiler and trace evidence around the server poll path: the process was in `poll`/`restart_syscall`, with `/proc` showing `poll_schedule_timeout.constprop.0`. That failure was observed by Fixer and not independently reproduced.

The cautious code-level connection is that the channel poll setup could arm a channel socket internally while passing `events = 0` to `ppoll()`. In this subsystem, a `pollfd` entry is the descriptor plus the event bits that tell the kernel what readiness should wake the loop. A zero event mask can leave the event loop waiting without observing the readiness it just calculated.

The patch changes the separate `c->sock` branch in `channels.c` to assign `pfd[p].events = ev`. The expected effect is that channel socket readiness requested through `SSH_CHAN_IO_SOCK_R/W` is actually watched by `ppoll()`, avoiding a misarmed channel poll entry.

I followed the saved plan. I checked `README.md` and nearby channel/server loop code; the relevant local conventions are `channel_prepare_poll()`, `dump_channel_poll()`, `fatal_f()`, `xrecallocarray()`, and `ptimeout_*`. No new API or non-obvious state translation was introduced, so no source comment was needed.

Files touched

channels.c

Validation

Fixer marked this proposal `ready` on 2026-05-14 11:08 UTC.
The underlying issue cluster has 2 report(s) and was last seen 2026-05-14 12:24 UTC.
The published diff touches channels.c.
`./configure && make` failed first with: `configure: error: ./configure.ac newer than configure, run autoreconf` Because there was no `Makefile` yet, I refreshed only the generated script timestamp and reran the workspace-root build: `touch configure && ./configure && make` Result: passed. `channels.c` compiled and the full build completed. Bounded independent reproduction/smoke attempt: `make tests LTESTS="forwarding dynamic-forward"` Result: blocked before reaching the forwarding tests. The harness failed starting `sshd_proxy` with: `Unsupported option UsePAM` `Missing privilege separation directory: /var/empty` `FATAL: sshd_proxy broken` Additional check: `git diff --check` Result: passed.

Download .patch Raw diff Browse successful patches

diff --git a/channels.c b/channels.c
index d7c55fc..c9d4b50 100644
--- a/channels.c
+++ b/channels.c
@@ -2858,7 +2858,7 @@ channel_prepare_pollfd(Channel *c, u_int *next_pollfd,
 		if (ev != 0) {
 			c->pfds[3] = p;
 			pfd[p].fd = c->sock;
-			pfd[p].events = 0;
+			pfd[p].events = ev;
 			dump_channel_poll(__func__, "sock", c, p, &pfd[p]);
 			p++;
 		}

Technical snapshot

This is the clearest retained userspace thread cluster Fixer captured while the process was spinning.

Representative thread backtrace

Command: sshd-session: gis [priv]
Why Fixer classified it this way: The process is demonstrably CPU-hot, but the current syscall and symbol sample does not point to a single dominant loop family yet.
Thread summary: thread backtraces show 1 thread(s) around 0x00007f202289a7d2 in ?? () from target:<path>
Contention signals: event-loop-wakeups, gdb-stderr: gdb: warning: Couldn't determine a path for the index cache directory.
Repeated loop: restart_syscall
Top syscalls: restart_syscall x1
Package: openssh-server 1:10.3p1-1
Kernel: 6.17.10+deb14-amd64
Distribution: debian

0x00007f202289a7d2 in ?? () from target:<path>
0x00007f202289a7ed in ?? () from target:<path>
0x00007f202290fd76 in poll () from target:<path>
0x000056295bc32f15 in ?? ()
0x000056295bc35264 in ?? ()
0x000056295bc0dd0e in ?? ()
0x00007f2022831f77 in ?? () from target:<path>
0x00007f2022832027 in __libc_start_main () from target:<path>

Possible duplicates

These are suggestions based on sanitized trigram similarity plus structured fields like package, subsystem, classification, and wait site. They are not auto-merged.

Runaway CPU investigation for sshd-session: unknown userspace loop at memset_orig

possible duplicate

sshd-session is stuck in a likely unclassified userspace loop: 18.73% of sampled CPU passed through memset_orig, with repeated thread backtraces show 1 thread(s) around 0x00007f4c71e9a7d2 in ?? () from target:<path>.