Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feather file causes segfault in R #117

Open
bjornerstedt opened this issue Mar 14, 2019 · 4 comments
Open

Feather file causes segfault in R #117

bjornerstedt opened this issue Mar 14, 2019 · 4 comments

Comments

@bjornerstedt
Copy link

Saving a DataFrame with Feather causes R to crash when reading the file. I am using Feather 0.5.1 with Julia 1.1. If I create a simple feather file with

df = DataFrame(A = 1:8)
Feather.write("df.feather", df)

I get the following crash in R 3.5.1 with the package feather 0.3.2:

> library(feather)
Warning message:
packagefeatherwas built under R version 3.5.2 
> ir = read_feather("df.feather")

 *** caught segfault ***
address 0x10ee5c0b0, cause 'memory not mapped'

Traceback:
 1: openFeather(path)
 2: feather(path)
 3: read_feather("df.feather")
@Rudi79
Copy link

Rudi79 commented Apr 5, 2019

The problem seems to be JuliaData/FlatBuffers.jl#38
On way to fix this is to pin the flatbuffer package at version 0.4.0

@danielfm123
Copy link

Feather files created with julia are bigger than the same dataset created with R.
Feather format is important in order to use multiple tools in an analytics workflow.

@ExpandingMan
Copy link
Collaborator

I'm not too surprised that the files created in Julia are bigger. As I recall, we've seen examples of some of the other writers automatically deciding to write dictionary encoded (i.e. compressed) columns. In this package we only do this if the original column is a CategoricalArray (i.e. already dictionary encoded). In some cases the resulting difference in file size can be quite huge.

To do the same, we'd need some sort of heuristic for deciding when to automatically use dictionary ecnoded columns.

@nalimilan
Copy link
Member

You could also support PooledArrays, and expect people to use that when they want to save memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants