gh-129205: Use os.readinto() in subprocess errpipe_read #129498

cmaloney · 2025-01-31T05:04:37Z

Read into a pre-allocated fixed size buffer.

The previous code the buffer could actually get to 100_000 bytes in two reads (first read returns 50_000, second pass through loop gets another 50_000), so this does change behavior. I think the fixed length of 50_000 was the intention though. This is used to pass exception issues that happen during _fork_exec from the child to parent process.

Issue: Add os.readinto API for reading data into a caller provided buffer #129205

Read into a pre-allocated fixed size buffer. The previous code the buffer could actually get to 100_000 bytes in two reads (first read returns 50_000, second pass through loop gets another 50_000), so this does change behavior. I think the fixed length of 50_000 was the intention though. This is used to pass exception issues that happen during _fork_exec from the child to parent process.

gpshead · 2025-02-02T21:12:16Z

Lib/subprocess.py

@@ -1921,12 +1921,13 @@ def _execute_child(self, args, executable, preexec_fn, close_fds,

                # Wait for exec to fail or succeed; possibly raising an
                # exception (limited in size)
-                errpipe_data = bytearray()


There was nothing special about the 50_000 in the existing code. I put it in there in order to be fail safe and have some limit on this pipe in case something weird happened as unbounded reads are unwise. In reality the data coming through this error pipe will always be tiny in normally operating processes on normally operating machines. This code is not trying to prevent allocations or limit allocation to a specific size. It's just a failsafe.

This PR would cause it to always allocate and zero fill excessive memory during normal operation.

The norm is that the exception info sent through the pipe, if any, will be smaller than PIPEBUF (512) and show up in its entirety in the first read cal. So this loop isn't expected to do much, the second += if there was an error to report will be a += b'' no-op before it breaks out of the loop.

If the existing bounded os.read calls happen to pre-allocate 50k each, they at least should not be zero filling those. we could change the number to 3500 and be fine if that allocation size actually matters to anyone. I don't think it does.

Basically: I think readinto is a regression in this use case. It is normally 0 data read, and always tiny data when there is something to read. The limit is merely a fail-safe. If you have identified performance issues due to the current values, propose lowering them. But lets not complicate things with a pre-allocated zero-filled bytearray and readinto. the normal pattern of system calls will be either:

read(fd) -> 0 (closed)

or with a very rare error to be reported:

read(fd) -> 20-60ish bytes. read(fd) -> 0 (closed)

The loop is a fail-safe because that's how we're supposed to use read APIs on pipes rather than relying on the OS mapping a single small <PIPEBUF write to a single small <PIPEBUF read on the other end, even though in practice that is what'll happen.

Neither bytearray nor os.read forces a zero fill currently, they just do PyMem_Malloc. bytearray does set the byte past the end to zero / guarantee the byte array is null terminated, but it doesn't zero-set the whole array.

bytearray construction: https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L136-L152

os.read:
os_read_impl: https://github.com/python/cpython/blob/main/Modules/posixmodule.c#L11436-L11463
PyBytes_FromStringandSize: https://github.com/python/cpython/blob/main/Objects/bytesobject.c#L136-L161
_PyBytes_FromSize: https://github.com/python/cpython/blob/main/Objects/bytesobject.c#L119-L133
_PyObject_InitVar: https://github.com/python/cpython/blob/main/Include/internal/pycore_object.h#L430-L447

Can definitely lower the initial allocation from 50_000, the os.read currently always allocates that much as well (then downsizes to the actual read size). I'll have a look at making and testing a better errpipe upper bound size.

cmaloney requested a review from gpshead as a code owner January 31, 2025 05:04

bedevere-app bot added the awaiting review label Jan 31, 2025

bedevere-app bot mentioned this pull request Jan 31, 2025

Add os.readinto API for reading data into a caller provided buffer #129205

Open

6 tasks

📜🤖 Added by blurb_it.

c4cb414

gpshead reviewed Feb 2, 2025

View reviewed changes

gpshead closed this Feb 2, 2025

cmaloney deleted the subprocess_readinto branch February 2, 2025 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-129205: Use os.readinto() in subprocess errpipe_read #129498

gh-129205: Use os.readinto() in subprocess errpipe_read #129498

cmaloney commented Jan 31, 2025 •

edited by bedevere-app bot

Loading

gpshead Feb 2, 2025

gpshead Feb 2, 2025

cmaloney Feb 2, 2025 •

edited

Loading

cmaloney Feb 2, 2025

gh-129205: Use os.readinto() in subprocess errpipe_read #129498

gh-129205: Use os.readinto() in subprocess errpipe_read #129498

Conversation

cmaloney commented Jan 31, 2025 • edited by bedevere-app bot Loading

gpshead Feb 2, 2025

Choose a reason for hiding this comment

gpshead Feb 2, 2025

Choose a reason for hiding this comment

cmaloney Feb 2, 2025 • edited Loading

Choose a reason for hiding this comment

cmaloney Feb 2, 2025

Choose a reason for hiding this comment

cmaloney commented Jan 31, 2025 •

edited by bedevere-app bot

Loading

cmaloney Feb 2, 2025 •

edited

Loading