-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Callbacks] Consolidate Saving Methods #1168
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
__all__ = ["modify_save_pretrained", "modify_fsdp_model_save_pretrained"] | ||
|
||
|
||
def modify_fsdp_model_save_pretrained(trainer, processor: Processor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this break terribly from the other changes you've made? Wondering why we want to remove code only to add it back later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-dellabetta I'm essentially removing this function because I think its approach is likely wrong. Instead of creating a separate save_pretrained function for FSDP models (and now having to maintain two functions which are mostly identical), instead we should simply use a context manager
Something like
with maybe_unwrap_fsdp(model):
original_save_pretrained(model, ...)
# alternatively
with unwrap_fsdp(model) if is_fsdp(model) else nullcontext():
original_save_pretrained(model, ...)
This consists of using the code from unwrap_and_export_model
and turning it into a context manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, as long as we're not inserting a bunch of breaking changes to common user pathways.
side note: i've seen this maybe_do_x
naming convention elsewhere in the code. this can be very confusing to a reader (myself included). unwrap_if_fsdp
would be a clearer name so user knows it's deterministic and what the condition is
__all__ = ["modify_save_pretrained", "modify_fsdp_model_save_pretrained"] | ||
|
||
|
||
def modify_fsdp_model_save_pretrained(trainer, processor: Processor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, as long as we're not inserting a bunch of breaking changes to common user pathways.
side note: i've seen this maybe_do_x
naming convention elsewhere in the code. this can be very confusing to a reader (myself included). unwrap_if_fsdp
would be a clearer name so user knows it's deterministic and what the condition is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having one central location to carry out saving logic sounds great!
Could you map when the current saving logic is carried out using which pathway, and how the new changes takes over the saving logic? Ex. when do the many different saving logic now gets carried out?
For FSDP, we do support it currently. Once stage runner is removed, then the assumption that any oneshot pathway will not have fsdp support will be valid.
@@ -418,7 +418,10 @@ def main( | |||
|
|||
# wrap model.save_pretrained | |||
if is_fsdp_model(model): | |||
modify_fsdp_model_save_pretrained(trainer, processor) | |||
raise NotImplementedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put this in after the oneshot / stage runner refac. Currently this is an ok pipeline.
@horheynm All the answers to your questions are in the PR description. w.r.t. fsdp, it is not supported now but will be at a later date (soon). |
Purpose
save_pretrained
functionBackground
All the things needed to be done during saving
After these changes, (1, 2, 3, 4) will be done within the
save_pretrained
function, and (5) will be the responsibility of the caller. (3) will be implemented in the next PR so as not to conflict with existing logic in pre_initAll of the places where a model is saved are
After these changes, all of these will be replaced by a single
save_checkpoint
function which callssave_pretrained
to do all the necessary thingsChanges
save_model_and_recipe
save_pretrained
functionsave_checkpoint
save_checkpoint
modify_fsdp_model_save_pretrained
andunwrap_and_export_model
, to be added back in a future release