-
-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] added PyTorch Profiler #2315
base: master
Are you sure you want to change the base?
Conversation
@Ishan-Kumar2 Thank you ! It looks great, I will play with the handler asap ! |
@sdesrozis @Ishan-Kumar2 can we move on with this PR ? |
Hi @vfdev-5, sorry for the delay. I'll have a look at it today :) |
@Ishan-Kumar2 Sorry for the delay on my side. I will use this handler in some own codes to see whether it works and give a feedback asap. |
9186ff1
to
333057e
Compare
@sdesrozis I tested the code on my local example, with this additional code # Define a PT Profiler
pt_profiler = PyTorchProfiler(on_trace_ready="tensorboard", output_path="./logs/train")
pt_profiler.attach(trainer) it produces 3 json files (non empty), but I am unable to load them on tensorboard using tensorboard --logdir=./logs it shows "No dashboards are active for the current data set." I think as long as json are being produced it should be correct since that's done by PyTorch Profiler, not changed by me. There must be some issue with my opening it. |
It works for me but don't forget to install the dedicated tensorboard pluggin
I left a few comments. |
@Ishan-Kumar2 it looks good. I think the next step now is about the tests. |
@sdesrozis great, will start working on the tests. |
@sdesrozis, Added some tests for the profiler. I have not added checks for the output of the profiler since I believe that is already done by PyTorch. |
return dummy_trainer | ||
|
||
|
||
def test_get_results(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should test firstly when the profiler is not attached to an engine. Secondly, you should test the presence and the absence of the expected keys.
pt_profiler.get_results(sort_key="cpu_times") | ||
|
||
|
||
def test_write_results(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test the files generated on more than one epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this
} | ||
|
||
def _profiler_create(self): | ||
self._profiler = torch.profiler.profile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should check the PyTorch version and provide a clear error message if version < 1.8 ?
And this check would be associated to a specific test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get how I should do this. In case the PyTorch version is <1.8 then I want all the tests to not run right?
So should I add a @pytest.mark.skipif
in all the tests?
You can get inspiration from these tests tests/ignite/contrib/handlers/test_tensorboard_logger.py. The tests need to be improved to check what is going on with the different backends (tpu, nccl, etc). |
@sdesrozis I have incorporated most of your suggestions. I am still working on the distributed tests will add those soon too. |
Fixes #1917
Description:
Added a minimum implementation of PyTorch profiler as a handler with engine. A lot of other features can be added, please let me know if you have any suggestions. Also I haven't added tests yet, if the initial code looks good, I can get started on those :)
Check list: