-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forced rechunking #32
Comments
Ok, edit, after a bit of reading...
|
We should return |
To arrow or from arrow? 🙂 (i.e. IntoPy or FromPyObject?) If I'm reading it right btw, chunked-array API is not part of arrow's stable C API, is that part of the problem here? // Yea, in some cases, this kind of rechunking may be catastrophic, e.g. if your dataframes are 50-100 GB, rechunk is the last thing you want to happen behind the scenes... |
But we could return a list of arrow arrays. 🤔 And then even use that to create a pyarrow ChunkedArray. |
Yea, I think that should work. Also a question then whether a single-chunk case should be special-cased or not (should it yield a list of one and produce a chunked array with a single batch, or a plain array) |
I think we can add a |
That sounds reasonable. The default being no rechunking? |
Yes. Default to not exploding your memory. 🙈 |
This took me a while to figure out (since this was the last place I'd expect a forced rechunk to happen) - while passing huge frames from Python to Rust and back, noticed that they end up arriving in one chunk even if they were multi-chunked originally.
Is there any reason to not leave rechunking to the end-user? (since in some cases it may end up being very detrimental)
pyo3-polars/pyo3-polars/src/lib.rs
Line 121 in 0165cb4
... and also this:
pyo3-polars/pyo3-polars/src/lib.rs
Line 163 in 0165cb4
The text was updated successfully, but these errors were encountered: