Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add linear_spaces #20941

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Jan 28, 2025

Closes #20922.

Examples

import polars as pl
pl.Config.set_fmt_table_cell_list_len(5)

df = pl.DataFrame({
    "start": [1, -1],
    "end": [3, 2],
    "num_samples": [4, 5]}
)

df.with_columns(ls=pl.linear_spaces("start", "end", "num_samples"))
# shape: (2, 4)
# ┌───────┬─────┬─────────────┬────────────────────────────────┐
# │ start ┆ end ┆ num_samples ┆ ls                             │
# │ ---   ┆ --- ┆ ---         ┆ ---                            │
# │ i64   ┆ i64 ┆ i64         ┆ list[f64]                      │
# ╞═══════╪═════╪═════════════╪════════════════════════════════╡
# │ 1     ┆ 3   ┆ 4           ┆ [1.0, 1.666667, 2.333333, 3.0] │
# │ -1    ┆ 2   ┆ 5           ┆ [-1.0, -0.25, 0.5, 1.25, 2.0]  │
# └───────┴─────┴─────────────┴────────────────────────────────┘

Using as_array=True returns a pl.Array, but can only be used with a constant num_samples (integer or literal integer):

df.with_columns(ls=pl.linear_spaces("start", "end", 3, as_array=True))
# shape: (2, 4)
# ┌───────┬─────┬─────────────┬──────────────────┐
# │ start ┆ end ┆ num_samples ┆ ls               │
# │ ---   ┆ --- ┆ ---         ┆ ---              │
# │ i64   ┆ i64 ┆ i64         ┆ array[f64, 3]    │
# ╞═══════╪═════╪═════════════╪══════════════════╡
# │ 1     ┆ 3   ┆ 4           ┆ [1.0, 2.0, 3.0]  │
# │ -1    ┆ 2   ┆ 5           ┆ [-1.0, 0.5, 2.0] │
# └───────┴─────┴─────────────┴──────────────────┘

Temporals work as well, and we can define the interval with left, right, none, or both (default):

import polars as pl
from datetime import date
pl.Config.set_fmt_table_cell_list_len(4)
pl.Config.set_fmt_str_lengths(99)

df = pl.DataFrame({
    "start": [date(2025, 1, 1), date(2025, 1, 2)],
    "end": [date(2025, 2, 1), date(2025, 2, 2)],
    "num_samples": [3, 4]}
)
df.with_columns(ls=pl.linear_spaces("start", "end", 3, as_array=True, closed="left"))
# shape: (2, 4)
# ┌────────────┬────────────┬─────────────┬─────────────────────────────────────────────────────────────────┐
# │ start      ┆ end        ┆ num_samples ┆ ls                                                              │
# │ ---        ┆ ---        ┆ ---         ┆ ---                                                             │
# │ date       ┆ date       ┆ i64         ┆ array[datetime[ms], 3]                                          │
# ╞════════════╪════════════╪═════════════╪═════════════════════════════════════════════════════════════════╡
# │ 2025-01-01 ┆ 2025-02-01 ┆ 3           ┆ [2025-01-01 00:00:00, 2025-01-11 08:00:00, 2025-01-21 16:00:00] │
# │ 2025-01-02 ┆ 2025-02-02 ┆ 4           ┆ [2025-01-02 00:00:00, 2025-01-12 08:00:00, 2025-01-22 16:00:00] │
# └────────────┴────────────┴─────────────┴─────────────────────────────────────────────────────────────────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 28, 2025
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 90.88235% with 31 lines in your changes missing coverage. Please review.

Project coverage is 79.29%. Comparing base (ea1ea5a) to head (eeb70dc).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...s/polars-plan/src/dsl/function_expr/range/utils.rs 89.43% 13 Missing ⚠️
crates/polars-plan/src/dsl/functions/range.rs 78.26% 10 Missing ⚠️
...s-plan/src/dsl/function_expr/range/linear_space.rs 93.69% 7 Missing ⚠️
...tes/polars-plan/src/dsl/function_expr/range/mod.rs 96.55% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20941      +/-   ##
==========================================
+ Coverage   79.20%   79.29%   +0.08%     
==========================================
  Files        1583     1583              
  Lines      225109   225858     +749     
  Branches     2581     2587       +6     
==========================================
+ Hits       178303   179091     +788     
+ Misses      46216    46177      -39     
  Partials      590      590              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcrumiller mcrumiller force-pushed the linear_spaces branch 2 times, most recently from 3a61c92 to f9feac6 Compare February 1, 2025 18:00
@mcrumiller
Copy link
Contributor Author

mcrumiller commented Feb 1, 2025

@coastalwhite I've added the as_array parameter. Let me know if you think the implementation is reasonable. To use that parameter, we have to verify that num_samples is a constant, which means either a python int or a pl.lit(x, dtype=<integer dtype>). In order to correctly determine the schema for the pl.Array dtype, we have to know what the width is in advance. So we check that the num_samples expr is a constant literal and send along array_width: Option<usize> to the DSL. The implementation can then later see if this is a Some(width) and if so it creates the array.

Side note: to determine if the input Expr is a constant literal, we check for an Expr::Literal(<integer type>) or an Expr::Cast. This is because pl.lit(x, dtype=<integer dtype>) ends up in a Cast expression, so if we see an Expr::Cast, we inspect the Cast{expr} component and see if that is an Expr::Literal<int>. I'm not sure if there is a better way to do this or not, but it seems to work pretty well.

@mcrumiller mcrumiller marked this pull request as ready for review February 1, 2025 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add linear_spaces (being to linear_space what int_ranges is to int_range)
1 participant