You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, DBMS-style DataSources allow for querying using parameters provided through args_dict. The most common case is probably providing a value to check equality with (e.g. an ID for an item to fetch).
Although DataSources claim to be isomorphic towards model code, in the case of the FileDataSource, one would have to write specific code to select the desired record from the loaded CSV.
To make this claim somewhat more true (and usage more consistent between kinds of DataSources, at least for the equality case), I propose to add functionality to FileDataSource.get_dataframe to use the args_dict parameter for selection. args_dict would be a dictionary of column key(s) with values to equality-test. get_dataframe would return a subset of the originally loaded DataFrame.
A mapping between "param name" and column name would also be nice -- otherwise, if one wants to keep lookup behaviour consistent between a database and csv source, one would have to change the parameter name in the sql to the corresponding column name
Description
Right now, DBMS-style DataSources allow for querying using parameters provided through args_dict. The most common case is probably providing a value to check equality with (e.g. an ID for an item to fetch).
Although DataSources claim to be isomorphic towards model code, in the case of the FileDataSource, one would have to write specific code to select the desired record from the loaded CSV.
To make this claim somewhat more true (and usage more consistent between kinds of DataSources, at least for the equality case), I propose to add functionality to
FileDataSource.get_dataframe
to use theargs_dict
parameter for selection. args_dict would be a dictionary of column key(s) with values to equality-test. get_dataframe would return a subset of the originally loaded DataFrame.Other comments
If there is a clean, reasonably fast, pandas-supported way to do this on CSVs without loading them into memory first, this would be preferrable to first loading all data, then filtering. Maybe this is relevant: https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function
The text was updated successfully, but these errors were encountered: