Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the Matbench submission process be handled entirely within a Google Colab notebook? #196

Open
sgbaird opened this issue Oct 7, 2022 · 2 comments

Comments

@sgbaird
Copy link
Contributor

sgbaird commented Oct 7, 2022

Not necessarily meaning let's change how the Matbench submission system works (I like the thoroughness/provenance), but rather if Google Colab has what's necessary to programatically follow the three submission steps:

  1. "Create 3 required files"
  2. "Put files in appropriate folder"
  3. "Create a PR to the Matbench repository"
@sgbaird
Copy link
Contributor Author

sgbaird commented Oct 7, 2022

Save notebook programmatically to harddisk:

Write info.json file to temporary Colab storage:

Put this in a directory structure using os or similar

Make pull request via API or CLI:

Short answer seems to be yes.

This could reduce the barrier to getting an initial "win" (e.g. verifying it works for a dummy model), and also opens up a direct opportunity for low-barrier even with bigger models if someone uses Colab Pro or Pro+. This would also make it easier for someone to mock-up a notebook with some dummy data and then ask someone with a Colab Pro account to click run. Anyway, just some thoughts based on some recent internal discussion with a colleague about Matbench.

@sgbaird
Copy link
Contributor Author

sgbaird commented Oct 8, 2022

The tough part is programmatically downloading a snapshot of the current notebook with outputs. This is even harder if this requires the Colab notebook to be the active window on the user's machine.

In the end, I think the user will have to "Save a copy to Drive" after clicking an "Open in Colab" link. From there, it should be possible to programmatically extract the file ID and download the (hopefully recently autosaved) notebook. Probably good to include save an extra notebook with the input history.

Some of the relevant code:

from IPython.display import display, Javascript
display(Javascript('IPython.notebook.save_checkpoint();')) # save, but probably only if window is active
%notebook -e notebook.ipynb

based on some modifications I'm doing for xtal2png + imagen-pytorch:

import time
from IPython.display import display, Javascript
display(Javascript('IPython.notebook.save_checkpoint();')) # save, but probably only if window is active
timestr = time.strftime("%Y%m%d-%H%M%S")
notebook_savepath = path.join(results_folder, f"notebook.ipynb")
print(notebook_savepath) # no output cells
%notebook -e notebook-input-history.ipynb
!mv notebook-input-history.ipynb $notebook_savepath

WIP at https://colab.research.google.com/drive/15YLOWHB_NkIIqKLO0ik784fsK2xJD08l?usp=sharing

Adding an "Open in Colab" badge could be accomplished ad-hoc in the Matbench actions. Still, maybe it's better to request that the user downloads the notebook and manually adds a version with outputs, markdown cells, and the "open in Colab" badge (which the Colab UI makes very straightforward).

Haven't fleshed out the details of creating the GitHub PR. Not sure if authentication will cause more problems than just following normal instructions.

once it's ready, then planning to share notebook with @jae3goals #141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant