Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receive "ArgumentError: Data is not in feather format" when reading dataframe written from Python #139

Open
def-mycroft opened this issue May 17, 2020 · 3 comments

Comments

@def-mycroft
Copy link

Hello, apologies in advance if I'm missing something simple here.

I want to write a dataframe to feather using Python and then load it into Julia. When I attempt to do this I receive an error ArgumentError: Data is not in feather format.

So, to provide a reproducible example, when I write out a dataframe in Python like this:

import pandas as pd
import feather
df = pd.read_json('{"open":{"0":443.9,"1":443.9,"2":443.97,"3":443.5,"4":443.8},"high":{"0":443.9,"1":443.9,"2":443.97,"3":443.5,"4":443.98},"low":{"0":443.9,"1":443.9,"2":443.6,"3":443.5,"4":443.8},"close":{"0":443.9,"1":443.9,"2":443.6,"3":443.5,"4":443.98},"volume":{"0":436,"1":264,"2":1122,"3":202,"4":3202}}')
feather.write_dataframe(df, 'from-py.feather')

...and then try to load it into Julia:

using Feather
df = Feather.read("from-py.feather")

...I receive:

ERROR: ArgumentError: Data is not in feather format: header = UInt8[0x41, 0x52, 0x52, 0x4f], footer = UInt8[0x52, 0x4f, 0x57, 0x31].
Stacktrace:
 [1] validatedata(::Array{UInt8,1}) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/loaddata.jl:11
 [2] #loaddata#6 at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/loaddata.jl:17 [inlined]
 [3] #loaddata at ./none:0 [inlined]
 [4] #Source#7(::Bool, ::Type, ::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:17
 [5] Type at ./none:0 [inlined]
 [6] #read#10(::Bool, ::Function, ::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:69
 [7] read(::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:69
 [8] top-level scope at none:0

Package versions etc:

  • Ubuntu 18.04.4 LTS
  • Julia Version 1.0.5
  • Julia Feather 0.5.6
  • Python 3.8.2
  • Python Feather 0.4.1
  • Python pyarrow 0.17.0
@ExpandingMan
Copy link
Collaborator

This is because pyarrow now uses Feather V2, which is just the arrow IPC format written to disk (i.e. the metadata is completely different than feather V1).

I am now deep into a complete rewrite of the Arrow.jl package, which will support reading and writing Feather V2. This package will likely be moved into legacy mode and support only reading and writing Feather V1.

I have added a note to the README regarding this. I will change this when Arrow.jl is complete. I'll also make a post on the Julia discourse. It'll probably be another few weeks before I have unit tests and all and am ready for a release, but keep an eye out if you're still interested. I won't support everything in the arrow standard right out of the gate (it's quite extensive by now), but certainly simple dataframes like you show here will be supported initially.

@def-mycroft
Copy link
Author

thanks for the note and work @ExpandingMan .

I'll leave this issue open for now so it is visible to others while the rewrite of Arrow.jl is in progress.

@chrizMM
Copy link

chrizMM commented Jan 7, 2022

Any news on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants