Skip to content
This repository has been archived by the owner on Oct 14, 2018. It is now read-only.

Online fit - WIP #45

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open

Online fit - WIP #45

wants to merge 41 commits into from

Conversation

thomasgreg
Copy link

@thomasgreg thomasgreg commented May 16, 2017

This is an attempt at issue #32.

The following WIP:
- removes TokenIterator and main_token which are dependent on parameters and their ordering
- constructs tokens for the fit names based on parameters uniquely without depending on a mapping; the dask graph is queried directly for previously encountered tasks

The current approach in part evolved out of becoming familiar with the assumptions of the existing codebase so I ended up being strict about keys and defensive in graph updates (see update_dsk). Passing around and managing a global seen mapping with dsk may achieve the same effect with minimal code change.

Have commited a simpler solution which avoids a major refactoring

Todo:

  • cleaner example using a class derived from DaskBaseSearchCV instead of an unwieldy function
  • tests around the uniqueness of keys (where dsk is updated directly instead of using seen)
  • ... general cleanup (ParamTokenIterator, example, ...)

thomasgreg added 14 commits May 15, 2017 03:31
Remaining failing tests involve:
- tighter hashing of values that are estimators
- cache (the amount of lookups in the graph / the new size of the graph)
Changes in `online-fit` branch now confined to
`online_model_selection.py` and `test_online_model_selection.py`
Asynchronous search WIP now involves only minor changes to the existing
code. Updated example to work with this.
@jcrist
Copy link
Member

jcrist commented May 19, 2017

Apologies for letting this sit so long - I'll try to give it a good review later today or sometime this weekend. Thanks for taking on this issue :).

@thomasgreg
Copy link
Author

No worries :) .. just found a bug so working on that and cleaning the example

@TomAugspurger
Copy link
Member

Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml

dask/dask-ml#221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse some components / structure from there. LMK if you want help with rebasing this on top of dask-ml.

@stsievert
Copy link
Member

I'm not sure how much you'll be able to reuse from dask/dask-ml#221 – most the framework there is with _partial_fit_and_score, not with the adaptive framework spelled out in scikit-learn/scikit-learn#9599

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants