-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for streaming? #27
Comments
Need to work on that |
When you're getting the batches and merging them [0], you could instead just yield on individual batch results. This feature is essential since handling large data sets (the bulk api is supposed to be good at this) becomes very slow or not possible. For reference, when I pull down all my Accounts (~ 50,000 records) with all my fields, I eat up all 16gigs of my memory, so I have to kill the process. Now imagine if I wanted to pull down my Tasks (~ 700,000 records). I know that there are also much larger orgs out there also. [0] - https://github.com/yatish27/salesforce_bulk_api/blob/master/lib/salesforce_bulk_api/job.rb#L197 Apparently salesforce doesn't expose batches for bulk queries... According to the documentation, salesforce will return up to 15 files up to 1GiB a piece. So 1 GiB is our batch size.. nice. What I mentioned earlier still applies, but will be a lot less useful at 1GiB a batch, which will certainly halt any computer built in today's world. I think streaming from http into an xml parser should still be possible, but more tedious since now the batching is left up to the library to implement. Here is some documentation on how to stream http responses in ruby: Another option for http streaming is downloading individual byte ranges using the http Range header and then yielding on each chunk until the end. Here is an example of a gem that does xml stream parsing: I believe that combining http streaming and xml stream parsing will accomplish what this feature request is asking for. |
Just to add to my previous comment, you could also download the entire batch file and then stream the file into an xml stream parser. |
@gostrc Can you send a branch with proposed changes? |
From the examples in the homepage README, it looks like any bulk query you submit must fit into memory as a Ruby array of hashes. Would it be possible to stream the results to a file for queries that pull back a lot of data?
The text was updated successfully, but these errors were encountered: