You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It'd be great if there was an error-handling mechanism for pl.collect_all such that the whole operation doesn't fail if a subset of the LazyFrames fail.
This currently raises an error and causes the whole operation to fail.
Traceback (most recent call last):
File "script.py", line 26, in<module>
df.explain(optimized=False)
File "/.pixi/envs/default/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 1124, in explain
returnself._ldf.describe_plan()
^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.NoDataError: empty CSV
It'd be great if I'd at least get a result for the first lazy frame.
One other issue is that I get a polars.exceptions.NoDataError: empty CSV, it doesn't show which file caused the error. For long lists of data frames, that can make debugging rather painful.
In the meantime, I'm looping over the list and running pl.collect on each LazyFrame individual (and wrapping everything in a try/except block. I'd love any guidance on the most efficient/performant way to speed this up would be, since the built-in collect_all doesn't work right now. Should I use a threading pool or multiprocessing? Polars seems to use all cores even for a single file so I was curious about how to handle parallelism for collecting multiple LazyFrames when collect_all is not an option.
The text was updated successfully, but these errors were encountered:
Description
It'd be great if there was an error-handling mechanism for
pl.collect_all
such that the whole operation doesn't fail if a subset of the LazyFrames fail.For example,
This currently raises an error and causes the whole operation to fail.
It'd be great if I'd at least get a result for the first lazy frame.
One other issue is that I get a
polars.exceptions.NoDataError: empty CSV
, it doesn't show which file caused the error. For long lists of data frames, that can make debugging rather painful.In the meantime, I'm looping over the list and running
pl.collect
on each LazyFrame individual (and wrapping everything in a try/except block. I'd love any guidance on the most efficient/performant way to speed this up would be, since the built-incollect_all
doesn't work right now. Should I use a threading pool or multiprocessing? Polars seems to use all cores even for a single file so I was curious about how to handle parallelism for collecting multiple LazyFrames whencollect_all
is not an option.The text was updated successfully, but these errors were encountered: