Public issue detail

Runaway CPU investigation for sshd-auth: timer churn at unresolved offset in sshd-auth

sshd-auth is stuck in a likely timer churn loop: 53.23% of sampled CPU passed through unresolved offset, with repeated thread backtraces show 1 thread(s) around 0x00007f97dda9a7d2 in ?? () from target:<path>.

highpackage: openssh-serversource: opensshscore: 106reports: 2patch attempt readyrelated upstream: closed-unmerged

Last seen: 2026-05-13 06:39 UTC. Public JSON: /v1/issues/019dfdca-f6fb-77c3-987d-954b3b228857

Patch related to upstream review

This diff is preserved for inspection, but a related source-family upstream review is the current handoff. Use the retained patch as evidence only; do not open a duplicate review from this artifact.

best patchcreated: 2026-05-13 06:10 UTCvalidation: readyrelated upstream: closed-unmerged

sshd-auth likely remains stuck in a timer churn loop. A diagnosis report and patch proposal were created locally.

Suggested subject

auth2: bound failed-auth delay after slow auth

Commit message.

Failed user authentication applies a small per-user delay to reduce timing leaks. When authentication work already exceeded that delay, `ensure_minimum_time_since()` scaled the delay upward until it exceeded the elapsed time, then slept the difference. Slow monitor or authentication backend work could therefore turn a millisecond-scale delay into a much longer `nanosleep()`.

Keep a bounded per-user delay for slow failed authentications instead of scaling it to match the elapsed backend time. Fast failures still wait until the requested per-user delay has elapsed.

Problem. sshd-auth is stuck in a likely timer churn loop: 53.23% of sampled CPU passed through unresolved offset, with repeated thread backtraces show 1 thread(s) around 0x00007f97dda9a7d2 in ?? () from target:<path>.

How this patch connects to the issue.

Fixer observed `sshd-auth` sleeping with traces dominated by `read`, `read`, and `clock_nanosleep`; sampled backtraces included `clock_nanosleep()`/`nanosleep()`, and the strace excerpt showed sleeps of about 0.46s and 1.78s after monitor/socket reads. This failure was observed by Fixer and not independently reproduced.

The source-level connection is a cautious match from inspection: `auth2.c` computes a per-user failed-auth delay, and the old slow-auth path repeatedly doubled that delay until it exceeded elapsed authentication work before calling `nanosleep()`. I changed that path to keep one bounded per-user delay when elapsed time already exceeds the target, rather than scaling the delay up to the slow backend time. I also added a short comment because this timing tradeoff is non-obvious.

The expected effect is to reduce long post-authentication sleeps in `sshd-auth` while preserving a per-user delay for failed authentication attempts instead of returning immediately after slow backend work.

This refinement addresses the review findings by replacing the first pass’s early return with a bounded delay and by describing the `auth2.c` mapping as an inspection-based connection, not a locally reproduced root cause. I found only `README.md` among the requested style docs, no `CONTRIBUTING`, `HACKING`, `README-hacking`, `docs/`, or `dev-docs/`. The patch uses existing OpenSSH conventions in this file: `monotime_double()`, `debug3_f()`, and portable `nanosleep()` support.

Files touched

auth2.c

Validation

Fixer marked this proposal `ready` on 2026-05-13 06:10 UTC.
The underlying issue cluster has 2 report(s) and was last seen 2026-05-13 06:39 UTC.
The published diff touches auth2.c.
`./configure && make` from the workspace root failed before build: `configure: error: ./configure.ac newer than configure, run autoreconf` To avoid modifying tracked generated files in the proposal workspace, I validated in a temporary copy: `autoreconf && ./configure --with-privsep-path=/tmp/openssh-privsep-90000-r1.o2UxDA && make` Result: passed. Bounded independent reproduction attempt: A temporary `sshd` config using the built `sshd`, `sshd-auth`, and `sshd-session` passed `./sshd -t -f <temp config>`, but the runtime trace attempt was blocked before authentication. `strace ... ./sshd -D -e -f <temp config>` failed with `PTRACE_TRACEME: Operation not permitted` / `ptrace(PTRACE_SEIZE, ...): Operation not permitted`, and the local client failed with `socket: Operation not permitted`. Confidence remains `observed`, not `reproduced`.

Related upstream review

This retained diff belongs to an issue family already tracked upstream as https://github.com/openssh/openssh-portable/pull/663 (closed-unmerged; 6 related rows).

Original harvest blockers

These blockers remain on the preserved local diff, but the related upstream review above is the current handoff.

blocked validation

Download .patch Raw diff Browse successful patches

diff --git a/auth2.c b/auth2.c
index 80f766e..5bb007e 100644
--- a/auth2.c
+++ b/auth2.c
@@ -256,9 +256,14 @@ ensure_minimum_time_since(double start, double seconds)
 		return;
 	}
 
-	/* if we've already passed the requested time, scale up */
-	while ((remain = seconds - elapsed) < 0.0)
-		seconds *= 2;
+	if ((remain = seconds - elapsed) <= 0.0) {
+		/*
+		 * Authentication work already exceeded the target delay.
+		 * Keep a bounded per-user delay without scaling it up to
+		 * match slow backends.
+		 */
+		remain = seconds;
+	}
 
 	ts.tv_sec = remain;
 	ts.tv_nsec = (remain - ts.tv_sec) * 1000000000;

Technical snapshot

This is the clearest retained userspace thread cluster Fixer captured while the process was spinning.

Representative thread backtrace

Command: sshd-auth: unknown [net]
Why Fixer classified it this way: Timer-related syscalls or symbols dominate the sample, which suggests a wakeup timer is firing too aggressively.
Thread summary: thread backtraces show 1 thread(s) around 0x00007f97dda9a7d2 in ?? () from target:<path>
Contention signals: gdb-stderr: gdb: warning: Couldn't determine a path for the index cache directory.
Repeated loop: read -> read -> clock_nanosleep
Top syscalls: read x4, write x3, clock_nanosleep x2
Package: openssh-server 1:10.3p1-1
Kernel: 6.17.10+deb14-amd64
Distribution: debian

0x00007f97dda9a7d2 in ?? () from target:<path>
0x00007f97ddae7c7a in clock_nanosleep () from target:<path>
0x00007f97ddaf35e3 in nanosleep () from target:<path>
0x00005651d8f442b6 in ?? ()
0x00005651d8f9a4a6 in ?? ()
0x00005651d8f9a59d in ?? ()
0x00005651d8f42b2c in ?? ()
0x00005651d8f311d5 in ?? ()

Possible duplicates

These are suggestions based on sanitized trigram similarity plus structured fields like package, subsystem, classification, and wait site. They are not auto-merged.

Runaway CPU investigation for sshd-auth: timer churn at apparmor_socket_recvmsg

possible duplicate

sshd-auth is stuck in a likely timer churn loop: 100.00% of sampled CPU passed through apparmor_socket_recvmsg, with repeated thread backtraces show 1 thread(s) around 0x00007fa460a9a7d2 in ?? () from target:<path>.

highpackage: openssh-serversource: opensshscore: 106reports: 2patch attempt readysimilarity: 99%