-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: load datasets with repeated items #92
base: master
Are you sure you want to change the base?
fix: load datasets with repeated items #92
Conversation
Hmm, could you describe a little better what you're trying to do? You're saying the input has two line items that are labelled the same but have different values? What are you suggesting that If you think this is a common enough problem, we could consider configuration to throw a warning and take the first row rather than erroring out. |
@@ -74,7 +74,7 @@ def from_series( | |||
if item_config.key in data_dict: | |||
# Multiple matches for data item. | |||
# First see if data is the same, then just skip | |||
if for_lookup[name] == data_dict[item_config.key]: | |||
if (for_lookup[name] == data_dict[item_config.key]).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickderobertis
the situation I had was a data set that had the same data item twice.
for example, the query I ran from a DB had two line items for Inventory
(arguably a mistake), the numbers were the same. It was just a duplicate line item, but I think this problem exists even if the duplicate line item has different data.
Here on line 77 I got an error. I think we need to add .all()
to check if all the items in the data series are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, yeah the code was definitely intended to handle duplicate items by skipping so if it is not doing that without this patch then we definitely need to get it in.
If you look at the test results it seems like this line change is causing an issue though. I'm also confused about why we would need this here, because for_lookup
is a series and so for_lookup[name]
should be a single value rather than a series. That's why the tests are erroring saying bool
has no attribute all
.
What error did you encounter with loading data with duplicate items? Perhaps a best next step on this is to revert this change and add two unit tests for loading a statement with duplicate items: one with the same numbers and one with different numbers. Then we will be able to see the error on CI and once all tests pass we will know this works properly for every case.
Came across this interesting scenario which I think can be fixed.
If the data we are trying to load has a repeated line item (e.g., two SALE items) the code is checking to see if all the data is the same but before this fix, it throws an error.
We could also add some tests for this.
What do you think?