Public issue detail

Runaway CPU investigation for sshd-session: unknown userspace loop at get_pid_task

sshd-session is stuck in a likely unclassified userspace loop: 20.65% of sampled CPU passed through get_pid_task, with repeated thread backtraces show 1 thread(s) around 0x00007f202289a7d2 in ?? () from target:<path>.

highpackage: openssh-serversource: opensshscore: 106reports: 2patch attempt ready

Last seen: 2026-05-14 21:04 UTC. Public JSON: /v1/issues/019e2683-8ae8-7de2-ac7c-52a3f304c75f

Pull-request-ready diff

This is the current best public patch attempt for the issue. The downloadable .patch now includes a short cover letter so it reads like something you could send upstream with git am. If you only want the raw diff, grab the .diff instead.

best patchcreated: 2026-05-14 13:41 UTCvalidation: ready

sshd-session likely remains stuck in an unclassified userspace loop. A diagnosis report and patch proposal were created locally.

Suggested subject

auth: avoid duplicate passwd lookup in getpwnamallow

Commit message.

`getpwnamallow()` checked whether a user existed for `Match Invalid-User` by calling `getpwnam(user)`, then immediately called `getpwnam(user)` again for the actual admission decision.

Reuse the first passwd lookup on the normal path and keep the second lookup only for the existing AIX `setauthdb` case, where the lookup is intentionally performed after switching authentication databases.

Problem. sshd-session is stuck in a likely unclassified userspace loop: 20.65% of sampled CPU passed through get_pid_task, with repeated thread backtraces show 1 thread(s) around 0x00007f202289a7d2 in ?? () from target:<path>.

How this patch connects to the issue.

Fixer observed `sshd-session: gis [priv]` with perf samples in NSS/passwd-related code (`libnss_systemd.so.2` and `_nss_files_parse_pwent`) while `/proc` and strace showed the sampled process sleeping in `poll`/`restart_syscall`. This was observed by Fixer and not independently reproduced as a runaway CPU loop.

The cautious source-level cause is redundant NSS work in the server authentication admission path: `getpwnamallow()` performed two unconditional passwd database lookups for the same user on non-AIX systems. NSS is the Name Service Switch layer that resolves users/groups through files, systemd, LDAP, and similar backends.

The patch reuses the first `getpwnam(user)` result to set `ci->user_invalid` and to continue admission checks. It preserves the platform compatibility convention around `_AIX && HAVE_SETAUTHDB` by keeping the second lookup there. Existing OpenSSH helpers/conventions were followed: `pwcopy()` still owns the returned passwd data, existing logging style is unchanged, and no new file/process/allocation/locking APIs were introduced. No source comment was added because the control flow is direct.

The expected effect is to remove one unconditional NSS passwd lookup per user admission on the common path, reducing avoidable work in the same subsystem implicated by the collected perf evidence.

Files touched

auth.c

Validation

Fixer marked this proposal `ready` on 2026-05-14 13:41 UTC.
The underlying issue cluster has 2 report(s) and was last seen 2026-05-14 21:04 UTC.
The published diff touches auth.c.
`./configure && make` failed immediately with: `configure: error: ./configure.ac newer than configure, run autoreconf` Following the project README’s generated-file path, `autoreconf && ./configure && make` completed successfully. The build emitted unrelated existing warnings in other files, but produced `sshd-session` and the rest of the tree. `git diff --check -- auth.c` passed. Independent reproduction attempt: `./sshd-session -T -C user=$(id -un),host=127.0.0.1,addr=127.0.0.1,laddr=127.0.0.1,lport=22` could not exercise the path because `sshd-session` reports `test/dump modes not supported`. The raw investigation directory named in `evidence.json` was also unavailable locally, so the runtime failure signal was not independently reproduced. A bounded before/after source check against the original snapshot confirmed the duplicate `getpwnam(user)` calls were reduced on the normal non-AIX path.

Download .patch Raw diff Browse successful patches

diff --git a/auth.c b/auth.c
index 2a9f3b8..cc8482b 100644
--- a/auth.c
+++ b/auth.c
@@ -474,7 +474,8 @@ getpwnamallow(struct ssh *ssh, const char *user)
 
 	ci = server_get_connection_info(ssh, 1, options.use_dns);
 	ci->user = user;
-	ci->user_invalid = getpwnam(user) == NULL;
+	pw = getpwnam(user);
+	ci->user_invalid = pw == NULL;
 	parse_server_match_config(&options, &includes, ci);
 	log_change_level(options.log_level);
 	log_verbose_reset();
@@ -484,11 +485,9 @@ getpwnamallow(struct ssh *ssh, const char *user)
 
 #if defined(_AIX) && defined(HAVE_SETAUTHDB)
 	aix_setauthdb(user);
-#endif
 
 	pw = getpwnam(user);
 
-#if defined(_AIX) && defined(HAVE_SETAUTHDB)
 	aix_restoreauthdb();
 #endif
 	if (pw == NULL) {

Technical snapshot

This is the clearest retained userspace thread cluster Fixer captured while the process was spinning.

Representative thread backtrace

Command: sshd-session: gis [priv]
Why Fixer classified it this way: The process is demonstrably CPU-hot, but the current syscall and symbol sample does not point to a single dominant loop family yet.
Thread summary: thread backtraces show 1 thread(s) around 0x00007f202289a7d2 in ?? () from target:<path>
Contention signals: event-loop-wakeups, gdb-stderr: gdb: warning: Couldn't determine a path for the index cache directory.
Repeated loop: restart_syscall
Top syscalls: restart_syscall x1
Package: openssh-server 1:10.3p1-1
Kernel: 6.17.10+deb14-amd64
Distribution: debian

0x00007f202289a7d2 in ?? () from target:<path>
0x00007f202289a7ed in ?? () from target:<path>
0x00007f202290fd76 in poll () from target:<path>
0x000056295bc32f15 in ?? ()
0x000056295bc35264 in ?? ()
0x000056295bc0dd0e in ?? ()
0x00007f2022831f77 in ?? () from target:<path>
0x00007f2022832027 in __libc_start_main () from target:<path>

Possible duplicates

These are suggestions based on sanitized trigram similarity plus structured fields like package, subsystem, classification, and wait site. They are not auto-merged.

Runaway CPU investigation for sshd-session: unknown userspace loop at memset_orig

possible duplicate

sshd-session is stuck in a likely unclassified userspace loop: 18.73% of sampled CPU passed through memset_orig, with repeated thread backtraces show 1 thread(s) around 0x00007f4c71e9a7d2 in ?? () from target:<path>.