Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on dfs-datastores #1473

Merged
merged 4 commits into from
Jan 20, 2016

Conversation

rubanm
Copy link
Contributor

@rubanm rubanm commented Jan 14, 2016

This change copies over the VersionedStore code from dfs-datastores, and drops our dependency on dfs-datastores. This helps remove some dependencies for the upgrade path to Cascading 3.

https://github.com/nathanmarz/dfs-datastores/blob/master/dfs-datastores/src/main/java/com/backtype/hadoop/datastores/VersionedStore.java

Pending:
Test runs on some real jobs using VersionedKeyValSource.

Open question:

This change currently drops the PailSource.scala file altogether. Should we:

  1. copy over the PailTap code from dfs-datastores to a scalding-pail module -- quite a few files to copy over and own in that case, to support one PailSource class in scalding.
  2. drop PailSource altogether -- no current usages at Twitter, but likely being used externally. There's also an open PR Allow constructing a PailSource from a PailSpec #1424

2 would be ideal but we'll need to provide a good way for users to migrate away.

@johnynek
Copy link
Collaborator

+1 to this.

For Pail fans, they can import the code into their project or migrate to: https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/PartitionedTextLine.scala (since we assume there are few, if any).

@ianoc
Copy link
Collaborator

ianoc commented Jan 15, 2016

This looks pretty good to me, it would be great to 'scalaize' it, would be shorter and easier to read. But i think we could keep that for future clean up work as a noobie issue really

@rubanm rubanm changed the title Remove dependency on dfs-datastores [do not merge] Remove dependency on dfs-datastores Jan 15, 2016
@rubanm rubanm changed the title [do not merge] Remove dependency on dfs-datastores Remove dependency on dfs-datastores Jan 18, 2016
@rubanm
Copy link
Contributor Author

rubanm commented Jan 18, 2016

Test runs done. Also fixed a bug in the store cleanup method (it was double-adding datasets with both version suffix and success files).

@johnynek
Copy link
Collaborator

👍 will wait for Ian to merge

ianoc added a commit that referenced this pull request Jan 20, 2016
Remove dependency on dfs-datastores
@ianoc ianoc merged commit 34872e9 into twitter:develop Jan 20, 2016
@rubanm rubanm deleted the rubanm/dfs_datastores branch January 21, 2016 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants