Public issue detail

Runaway CPU investigation for sshd-session: unknown userspace loop at nf_ct_get_tuple

sshd-session is stuck in a likely unclassified userspace loop: 100.00% of sampled CPU passed through nf_ct_get_tuple, with repeated thread backtraces show 1 thread(s) around 0x00007f9d21c9a7d2 in ?? () from target:<path>.

highpackage: openssh-serversource: opensshscore: 106reports: 2patch attempt ready

Last seen: 2026-05-15 15:53 UTC. Public JSON: /v1/issues/019dfd2e-6077-7a51-b6b8-e030a863fa21

Pull-request-ready diff

This is the current best public patch attempt for the issue. The downloadable .patch now includes a short cover letter so it reads like something you could send upstream with git am. If you only want the raw diff, grab the .diff instead.

best patchcreated: 2026-05-15 15:36 UTCvalidation: ready

sshd-session likely remains stuck in an unclassified userspace loop. A diagnosis report and patch proposal were created locally.

Suggested subject

channels: arm poll events for channel socket fds

Commit message.

`channel_prepare_pollfd()` builds the `pollfd` array used by the SSH channel event loop. For channel socket fds, it computed the requested `POLLIN`/`POLLOUT` event mask but stored zero in `pollfd.events`, so `ppoll()` would not wait for the readiness the channel handlers expected.

Store the computed event mask for socket fds, matching the existing `rfd`, `wfd`, and `efd` paths.

Problem. sshd-session is stuck in a likely unclassified userspace loop: 100.00% of sampled CPU passed through nf_ct_get_tuple, with repeated thread backtraces show 1 thread(s) around 0x00007f9d21c9a7d2 in ?? () from target:<path>.

How this patch connects to the issue.

Fixer observed `sshd-session` processes with profiler samples in kernel conntrack, `strace` showing `restart_syscall`, and `/proc` state showing the process sleeping in `poll_schedule_timeout`. I did not independently reproduce that production signal.

The likely root cause in the channel event loop is that socket-only channel fds were registered in the `pollfd` array with `events = 0` even after the code computed `POLLIN` or `POLLOUT` from `SSH_CHAN_IO_SOCK_R/W`. In OpenSSH’s channel subsystem, `io_want` records which fd readiness a channel needs before its post-poll handler can run.

The patch changes the socket fd poll setup to store `ev`, the same computed event mask used by the other fd slots. The expected effect is that listener and connecting socket channels can actually wake `ppoll()` for the readiness they requested, preventing this event-loop path from parking a session without arming the socket events it needs.

No new helpers were needed. The touched subsystem already uses `xrecallocarray()` for pollfd allocation, `fatal_f()`/`debug3()` logging, and `ptimeout_*()` for `ppoll()` timeouts; this patch follows the existing local pattern. I checked `README.md`; no `CONTRIBUTING`/`HACKING` docs were present.

Files touched

channels.c

Validation

Fixer marked this proposal `ready` on 2026-05-15 15:36 UTC.
The underlying issue cluster has 2 report(s) and was last seen 2026-05-15 15:53 UTC.
The published diff touches channels.c.
`./configure && make` from the workspace root failed immediately with: `configure: error: ./configure.ac newer than configure, run autoreconf` Following the documented git-build path in `README.md`, `autoreconf && ./configure && make` completed successfully. Bounded runtime/reproduction attempt: `make t-exec LTESTS=dynamic-forward` did not reach the forwarding scenario. It failed during regress setup with: `sshd_proxy line 15: Unsupported option UsePAM` `Missing privilege separation directory: /var/empty` `FATAL: sshd_proxy broken` So the original stuck-`poll` signal remains observed by Fixer, not independently reproduced. `git diff --check` passed.

Download .patch Raw diff Browse successful patches

diff --git a/channels.c b/channels.c
index d7c55fc..c9d4b50 100644
--- a/channels.c
+++ b/channels.c
@@ -2858,7 +2858,7 @@ channel_prepare_pollfd(Channel *c, u_int *next_pollfd,
 		if (ev != 0) {
 			c->pfds[3] = p;
 			pfd[p].fd = c->sock;
-			pfd[p].events = 0;
+			pfd[p].events = ev;
 			dump_channel_poll(__func__, "sock", c, p, &pfd[p]);
 			p++;
 		}

Technical snapshot

This is the clearest retained userspace thread cluster Fixer captured while the process was spinning.

Representative thread backtrace

Command: sshd-session: gis [priv]
Why Fixer classified it this way: The process is demonstrably CPU-hot, but the current syscall and symbol sample does not point to a single dominant loop family yet.
Thread summary: thread backtraces show 1 thread(s) around 0x00007f9d21c9a7d2 in ?? () from target:<path>
Contention signals: event-loop-wakeups, gdb-stderr: gdb: warning: Couldn't determine a path for the index cache directory.
Repeated loop: restart_syscall
Top syscalls: restart_syscall x1
Package: openssh-server 1:10.3p1-1
Kernel: 6.17.10+deb14-amd64
Distribution: debian

0x00007f9d21c9a7d2 in ?? () from target:<path>
0x00007f9d21c9a7ed in ?? () from target:<path>
0x00007f9d21d0fd76 in poll () from target:<path>
0x0000564a1c3cff15 in ?? ()
0x0000564a1c3d2264 in ?? ()
0x0000564a1c3aad0e in ?? ()
0x00007f9d21c31f77 in ?? () from target:<path>
0x00007f9d21c32027 in __libc_start_main () from target:<path>

Possible duplicates

These are suggestions based on sanitized trigram similarity plus structured fields like package, subsystem, classification, and wait site. They are not auto-merged.

Runaway CPU investigation for sshd-session: unknown userspace loop at memset_orig

possible duplicate

sshd-session is stuck in a likely unclassified userspace loop: 18.73% of sampled CPU passed through memset_orig, with repeated thread backtraces show 1 thread(s) around 0x00007f4c71e9a7d2 in ?? () from target:<path>.