All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- Pyarrow >=19 required
- Python >=3.10 required
1.8 - 2024-11-01
- Pyarrow >=18 required
isodate
dependency for durations- Acero engine used for scanning
- Grouping defaults to parallelized but unordered
- Partitioning supports arbitrary functions
group
optimized for dictionary arraysrank
optimized for out-of-core
1.7 - 2024-07-19
- Pyarrow >=17 required
- Partitioning supports original indices
- Acero engine declaration
Duration
format improvements
- Strawberry >=0.236 compatible
1.6 - 2024-04-30
- Pyarrow >=16 required
group
optimized for datasetsDuration
scalar
Interval
type
1.5 - 2024-01-24
- Pyarrow >=15 required
- Strawberry >=0.212 compatible
- Starlette >=0.36 compatible
1.4 - 2023-11-05
- Pyarrow >=14 required
- Python >=3.9 required
group
optimized for memory
fragments
replaced bygroup
min
andmax
replaced byrank
partition
replaced byruns
list
aggregation must be explicitgroup
list functions are inapply
1.3 - 2023-08-25
- Pyarrow >=13 required
- List filtering and sorting moved to functions and optimized
- Dataset filtering, grouping, and sorting on fragments optimized
group
can aggregate entire table
flatten
field for list columnsrank
field for min and max filtering- Schema extensions for metrics and deprecations
optional
field for partial query resultsdropNull
,fillNull
, andsize
fields- Command-line utilities
- Allow datasets with invalid field names
fragments
field deprecated and functionality moved togroup
field- Implicit list aggregation on
group
deprecated partition
field deprecated and renamed toruns
1.2 - 2023-05-07
- Pyarrow >=12 required
- Grouping fragments optimized
- Group by empty columns
- Batch sorting and grouping into lists
1.1 - 2023-01-29
- Pyarrow >=11 required
- Python >=3.8 required
- Scannable functions added
- List aggregations deprecated
- Group by fragments
- Month day nano interval array
min
andmax
fields memory optimized
1.0 - 2022-10-28
- Pyarrow >=10 required
- Dataset schema introspection
- Dataset scanning with selection and projection
- Binary search on sorted columns
- List aggregation, filtering, and sorting optimizations
- Compute functions generalized
- Multiple datasets and federation
- Provisional dataset
join
andtake
0.9 - 2022-08-04
- Pyarrow >=9 required
- Multi-directional sorting
- Removed unnecessary interfaces
- Filtering has stricter typing
0.8 - 2022-05-08
- Pyarrow >=8 required
- Grouping and aggregation integrated
AbstractTable
interface renamed toDataset
Binary
scalar renamed toBase64
0.7 - 2022-02-04
- Pyarrow >=7 required
FILTERS
use query syntax and trigger reading the datasetFEDERATED
field configuration- List columns support sorting and filtering
- Group by and aggregate optimizations
- Dataset scanning
0.6 - 2021-10-28
- Pyarrow >=6 required
- Group by optimized and replaced
unique
field - Dictionary related optimizations
- Null consistency with arrow
count
functions
0.5 - 2021-08-06
- Pyarrow >=5 required
- Stricter validation of inputs
- Columns can be cast to another arrow data type
- Grouping uses large list arrays with 64-bit counts
- Datasets are read on-demand or optionally at startup
0.4 - 2021-05-16
- Pyarrow >=4 required
sort
updated to use new native routinespartition
tables by adjacent values and differencesfilter
supports unknown column types using tagged union patternGroups
replaced withTable.tables
andTable.aggregate
fields- Tagged unions used for
filter
,apply
, andpartition
functions
0.3 - 2021-01-31
- Pyarrow >=3 required
any
andall
fields- String column
split
field
0.2 - 2020-11-26
- Pyarrow >= 2 required
ListColumn
andStructColumn
typesGroups
type withaggregate
fieldgroup
andunique
optimized- Statistical fields:
mode
,stddev
,variance
is_in
,min
, andmax
optimized