-
-
Notifications
You must be signed in to change notification settings - Fork 43
Online fit - WIP #45
base: master
Are you sure you want to change the base?
Online fit - WIP #45
Conversation
Remaining failing tests involve: - tighter hashing of values that are estimators - cache (the amount of lookups in the graph / the new size of the graph)
Changes in `online-fit` branch now confined to `online_model_selection.py` and `test_online_model_selection.py`
Asynchronous search WIP now involves only minor changes to the existing code. Updated example to work with this.
Apologies for letting this sit so long - I'll try to give it a good review later today or sometime this weekend. Thanks for taking on this issue :). |
No worries :) .. just found a bug so working on that and cleaning the example |
I.e. the graph should be independent of the number and order of updates via update_graph
# Conflicts: # dask_searchcv/tests/test_model_selection.py
Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml dask/dask-ml#221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse some components / structure from there. LMK if you want help with rebasing this on top of dask-ml. |
I'm not sure how much you'll be able to reuse from dask/dask-ml#221 – most the framework there is with |
This is an attempt at issue #32.
The following WIP:- removes TokenIterator and main_token which are dependent on parameters and their ordering- constructs tokens for the fit names based on parameters uniquely without depending on a mapping; the dask graph is queried directly for previously encountered tasksThe current approach in part evolved out of becoming familiar with the assumptions of the existing codebase so I ended up being strict about keys and defensive in graph updates (seeupdate_dsk
). Passing around and managing a globalseen
mapping withdsk
may achieve the same effect with minimal code change.Have commited a simpler solution which avoids a major refactoring
Todo:
DaskBaseSearchCV
instead of an unwieldy functiondsk
is updated directly instead of usingseen
)