Public issue detail

Runaway CPU investigation for python3.13: busy poll at _PyEval_EvalFrameDefault

python3.13 is stuck in a likely busy poll loop: 100.00% of sampled CPU passed through _PyEval_EvalFrameDefault, with repeated thread backtraces show 1 thread(s) around 0x00007f1c6e7efe92 in pthread_attr_destroy () from target:<path>.

highpackage: python3.13-minimalsource: python3.13score: 106reports: 2patch attempt ready

Last seen: 2026-05-10 06:16 UTC. Public JSON: /v1/issues/019dda71-4897-7d62-853d-31f4899d9f8e

Pull-request-ready diff

This is the current best public patch attempt for the issue. The downloadable .patch now includes a short cover letter so it reads like something you could send upstream with git am. If you only want the raw diff, grab the .diff instead.

best patchcreated: 2026-04-30 22:10 UTCvalidation: ready

python3.13 likely remains stuck in a busy-poll loop. A diagnosis report and patch proposal were created locally.

Suggested subject

subprocess: keep clamped pidfd waits looping

Commit message.

Use pidfd polling for POSIX `Popen.wait(timeout=...)` when available, and keep very large or infinite Python timeouts compatible by clamping each `poll()` call while continuing to recompute the Python deadline.

Problem. python3.13 is stuck in a likely busy poll loop: 100.00% of sampled CPU passed through _PyEval_EvalFrameDefault, with repeated thread backtraces show 1 thread(s) around 0x00007f1c6e7efe92 in pthread_attr_destroy () from target:<path>.

How this patch connects to the issue.

The user-visible symptom was a `python3.13` process repeatedly waking in `poll` and `wait4`, with CPU samples in Python frame evaluation. A plausible code-level cause is the POSIX `subprocess.Popen._wait(timeout)` loop, which uses repeated nonblocking `waitpid(..., WNOHANG)` calls with short sleeps.

The patch adds a Linux pidfd-backed wait path for timeout waits. This refinement addresses the review finding by treating an empty result from a clamped `poll()` as one elapsed kernel-sized wait, then continuing the loop while the Python timeout still has time remaining. The regression test now covers the case where the first clamped poll returns no events and a later poll observes process exit.

The expected effect is fewer avoidable userspace wakeups for normal timeout waits on Linux with pidfds, while preserving behavior for very large and infinite timeouts. The code includes short comments for the non-obvious timeout clamping and pidfd/readiness remapping logic.

Files touched

Lib/subprocess.py
Lib/test/test_subprocess.py

Validation

Fixer marked this proposal `ready` on 2026-04-30 22:10 UTC.
The underlying issue cluster has 2 report(s) and was last seen 2026-05-10 06:16 UTC.
The published diff touches Lib/subprocess.py, Lib/test/test_subprocess.py.
`TMPDIR=/tmp/fixer-86731 PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=Lib /usr/bin/python3.13 -m test test_subprocess -m 'test_wait_timeout_pidfd_clamps_large_timeouts'` passed. `TMPDIR=/tmp/fixer-86731 PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=Lib /usr/bin/python3.13 -m test test_subprocess -m 'test_wait*'` passed. `git diff --check` passed. `git status --short` shows only `Lib/subprocess.py`, `Lib/test/test_subprocess.py`, and an untracked `.codex` workspace mount. I tried to remove `.codex`, but `unlink` reported `Device or resource busy`; it is not part of the patch.

Download .patch Raw diff Browse successful patches

diff --git a/Lib/subprocess.py b/Lib/subprocess.py
index 3a8c743..b41920b 100644
--- a/Lib/subprocess.py
+++ b/Lib/subprocess.py
@@ -2049,6 +2049,74 @@ class Popen:
                 sts = 0
             return (pid, sts)
 
+        def _wait_pidfd(self, endtime, orig_timeout):
+            """Wait for process exit using a pidfd."""
+            if not hasattr(os, "pidfd_open") or not hasattr(select, "poll"):
+                return False
+
+            if self._waitpid_lock.acquire(False):
+                try:
+                    if self.returncode is not None:
+                        return True  # Another thread waited.
+                    (pid, sts) = self._try_wait(os.WNOHANG)
+                    assert pid == self.pid or pid == 0
+                    if pid == self.pid:
+                        self._handle_exitstatus(sts)
+                        return True
+                finally:
+                    self._waitpid_lock.release()
+            else:
+                return False
+
+            try:
+                pidfd = os.pidfd_open(self.pid)
+            except OSError:
+                return False
+
+            try:
+                poller = select.poll()
+                poller.register(pidfd, select.POLLIN)
+                while self.returncode is None:
+                    remaining = self._remaining_time(endtime)
+                    if remaining <= 0:
+                        raise TimeoutExpired(self.args, orig_timeout)
+                    # Preserve the legacy timeout loop's handling of NaN.
+                    if remaining != remaining:
+                        return False
+
+                    # select.poll() accepts a signed int millisecond
+                    # timeout.  Clamp longer waits and recheck the Python
+                    # deadline after each poll returns.
+                    max_timeout = 2_147_483_647
+                    clamped = remaining >= max_timeout / 1000
+                    if clamped:
+                        timeout = max_timeout
+                    else:
+                        timeout = max(1, int(remaining * 1000 + 0.999))
+                    if not poller.poll(timeout):
+                        if clamped:
+                            continue
+                        raise TimeoutExpired(self.args, orig_timeout)
+
+                    if self._waitpid_lock.acquire(False):
+                        try:
+                            if self.returncode is not None:
+                                return True  # Another thread waited.
+                            # A readable pidfd means the same child should now
+                            # be waitable; keep that pidfd/waitpid identity
+                            # check intact.
+                            (pid, sts) = self._try_wait(os.WNOHANG)
+                            assert pid == self.pid or pid == 0
+                            if pid == self.pid:
+                                self._handle_exitstatus(sts)
+                                return True
+                        finally:
+                            self._waitpid_lock.release()
+                    return False
+                return True
+            finally:
+                os.close(pidfd)
+
 
         def _wait(self, timeout):
             """Internal implementation of wait() on POSIX."""
@@ -2057,6 +2125,9 @@ class Popen:
 
             if timeout is not None:
                 endtime = _time() + timeout
+                if self._wait_pidfd(endtime, timeout):
+                    return self.returncode
+
                 # Enter a busy loop if we have a timeout.  This busy loop was
                 # cribbed from Lib/threading.py in Thread.wait() at r71065.
                 delay = 0.0005 # 500 us -> initial delay of 1 ms
diff --git a/Lib/test/test_subprocess.py b/Lib/test/test_subprocess.py
index a9f0ce2..3b9e9e2 100644
--- a/Lib/test/test_subprocess.py
+++ b/Lib/test/test_subprocess.py
@@ -2070,6 +2070,117 @@ class POSIXProcessTestCase(BaseTestCase):
         def __del__(self):
             pass
 
+    def _make_wait_test_popen(self):
+        proc = subprocess.Popen.__new__(subprocess.Popen)
+        proc.args = ["test"]
+        proc.pid = 12345
+        proc.returncode = None
+        proc._waitpid_lock = threading.Lock()
+        return proc
+
+    @unittest.skipUnless(hasattr(select, "poll"), "select.poll required")
+    def test_wait_timeout_uses_pidfd(self):
+        proc = self._make_wait_test_popen()
+        pidfd = 10
+        poller = mock.Mock()
+        poller.poll.return_value = [(pidfd, select.POLLIN)]
+
+        with mock.patch("subprocess.os.pidfd_open", return_value=pidfd,
+                        create=True) as pidfd_open, \
+             mock.patch("subprocess.select.poll",
+                        return_value=poller) as poll, \
+             mock.patch("subprocess.os.waitpid",
+                        side_effect=[(0, 0), (proc.pid, 0)]) as waitpid, \
+             mock.patch("subprocess.os.close") as close:
+            self.assertEqual(proc._wait(timeout=10), 0)
+
+        pidfd_open.assert_called_once_with(proc.pid)
+        poll.assert_called_once_with()
+        poller.register.assert_called_once_with(pidfd, select.POLLIN)
+        self.assertEqual(waitpid.call_args_list,
+                         [mock.call(proc.pid, os.WNOHANG),
+                          mock.call(proc.pid, os.WNOHANG)])
+        close.assert_called_once_with(pidfd)
+
+    @unittest.skipUnless(hasattr(select, "poll"), "select.poll required")
+    def test_wait_timeout_pidfd_clamps_large_timeouts(self):
+        for timeout in (float("inf"), 10**20):
+            with self.subTest(timeout=timeout):
+                proc = self._make_wait_test_popen()
+                pidfd = 10
+                poller = mock.Mock()
+                poller.poll.side_effect = [[], [(pidfd, select.POLLIN)]]
+
+                with mock.patch("subprocess.os.pidfd_open",
+                                return_value=pidfd, create=True), \
+                     mock.patch("subprocess.select.poll",
+                                return_value=poller), \
+                     mock.patch("subprocess.os.waitpid",
+                                side_effect=[(0, 0), (proc.pid, 0)]), \
+                     mock.patch("subprocess._time",
+                                side_effect=[100.0, 100.0, 200.0]), \
+                     mock.patch("subprocess.os.close"):
+                    self.assertEqual(proc._wait(timeout=timeout), 0)
+
+                self.assertEqual(poller.poll.call_args_list,
+                                 [mock.call(2_147_483_647),
+                                  mock.call(2_147_483_647)])
+
+    @unittest.skipUnless(hasattr(select, "poll"), "select.poll required")
+    def test_wait_timeout_pidfd_timeout(self):
+        proc = self._make_wait_test_popen()
+        pidfd = 10
+        poller = mock.Mock()
+        poller.poll.return_value = []
+
+        with mock.patch("subprocess.os.pidfd_open", return_value=pidfd,
+                        create=True), \
+             mock.patch("subprocess.select.poll", return_value=poller), \
+             mock.patch("subprocess.os.waitpid",
+                        return_value=(0, 0)) as waitpid, \
+             mock.patch("subprocess.os.close") as close:
+            with self.assertRaises(subprocess.TimeoutExpired):
+                proc._wait(timeout=10)
+
+        waitpid.assert_called_once_with(proc.pid, os.WNOHANG)
+        close.assert_called_once_with(pidfd)
+
+    def test_wait_timeout_pidfd_open_fallback(self):
+        proc = self._make_wait_test_popen()
+
+        with mock.patch("subprocess.os.pidfd_open",
+                        side_effect=OSError(errno.ENOSYS, "pidfd_open"),
+                        create=True) as pidfd_open, \
+             mock.patch("subprocess.os.waitpid",
+                        side_effect=[(0, 0), (0, 0),
+                                     (proc.pid, 0)]) as waitpid, \
+             mock.patch("subprocess.time.sleep") as sleep:
+            self.assertEqual(proc._wait(timeout=10), 0)
+
+        pidfd_open.assert_called_once_with(proc.pid)
+        self.assertEqual(waitpid.call_args_list,
+                         [mock.call(proc.pid, os.WNOHANG),
+                          mock.call(proc.pid, os.WNOHANG),
+                          mock.call(proc.pid, os.WNOHANG)])
+        sleep.assert_called()
+
+    def test_wait_timeout_does_not_block_on_waitpid_lock(self):
+        proc = self._make_wait_test_popen()
+
+        class BusyWaitpidLock:
+            def acquire(self, blocking=True):
+                if blocking:
+                    raise AssertionError("_wait() must not block on timeout")
+                return False
+
+            def release(self):
+                raise AssertionError("lock was not acquired")
+
+        proc._waitpid_lock = BusyWaitpidLock()
+        with mock.patch("subprocess._time", side_effect=[10.0, 10.2]):
+            with self.assertRaises(subprocess.TimeoutExpired):
+                proc._wait(timeout=0.1)
+
     @mock.patch("subprocess._fork_exec")
     def test_exception_errpipe_normal(self, fork_exec):
         """Test error passing done through errpipe_write in the good case"""

Technical snapshot

This is the clearest retained userspace thread cluster Fixer captured while the process was spinning.

Representative thread backtrace

Command: /usr/bin/python3 /usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
Why Fixer classified it this way: The trace repeatedly returns to a poll-family syscall without meaningful blocking, which suggests a busy event-loop wakeup.
Thread summary: thread backtraces show 1 thread(s) around 0x00007f1c6e7efe92 in pthread_attr_destroy () from target:<path>
Contention signals: gdb-stderr: gdb: warning: Couldn't determine a path for the index cache directory.
Repeated loop: poll -> wait4 -> poll
Top syscalls: poll x5, wait4 x5, restart_syscall x1
Package: python3.13-minimal 3.13.12-1
Kernel: 6.17.10+deb14-amd64
Distribution: debian

0x00007f1c6e7efe92 in pthread_attr_destroy () from target:<path>
0x0000000000bb4cd4 in ?? ()
0x00000000160e950c in ?? ()
0x0000000000a56c88 in _PyRuntime ()
0x00007f1c6e7efead in ?? () from target:<path>
0x0000000000000007 in ?? ()
0x000000000054aeb0 in PyMem_Realloc ()
0x000000000067ca5b in ?? ()

Possible duplicates

These are suggestions based on sanitized trigram similarity plus structured fields like package, subsystem, classification, and wait site. They are not auto-merged.

Runaway CPU investigation for python3.13: busy poll at idle_cpu

possible duplicate

python3.13 is stuck in a likely busy poll loop: 40.65% of sampled CPU passed through idle_cpu, with repeated thread backtraces show 1 thread(s) around 0x00007f1c6e7efe92 in pthread_attr_destroy () from target:<path>.

highpackage: python3.13-minimalsource: python3.13score: 106reports: 2similarity: 99%