Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding and decoding objects such as ProvenanceDoc-s via e.g. as_dict() and from_dict() #615

Open
sgbaird opened this issue Jun 4, 2022 · 8 comments

Comments

@sgbaird
Copy link

sgbaird commented Jun 4, 2022

I've been having a bit of a heyday trying to save a DataFrame to a JSON file (or a jsonpickle JSON file) when it includes ProvenanceDoc objects. My workaround right now is just to extract some minimal data from each document, such as references and material_id. Wondering if you have any suggestions.

I'm trying to follow the style of Matbench/Matminer in having my own benchmark dataset stored on figshare and encoding/decoding it. Maybe I'm too hung up on saving a ProvenanceDoc and should stick with extracting what I can easily/manually.

@sgbaird
Copy link
Author

sgbaird commented Jun 4, 2022

@sgbaird
Copy link
Author

sgbaird commented Jun 4, 2022

Object of type CrystalSystem is not JSON serializable
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\monty\json.py", line 321, in default
    d = o.as_dict()

During handling of the above exception, another exception occurred:

  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\monty\json.py", line 336, in default
    return json.JSONEncoder.default(self, o)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\pandas\io\json\_json.py", line 172, in write
    return dumps(
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\pandas\io\json\_json.py", line 110, in to_json
    s = writer(
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\pandas\core\generic.py", line 2621, in to_json
    return json.to_json(
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\site-packages\monty\json.py", line 301, in default
    "data": o.to_json(default_handler=MontyEncoder().encode),
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Users\sterg\miniconda3\envs\mp-time-split\Lib\json\__init__.py", line 234, in dumps
    return cls(
  File "C:\Users\sterg\Documents\GitHub\sparks-baird\mp-time-split\scripts\data_snapshot.py", line 20, in <module>
    json.dumps(dummy_expt_df, cls=MontyEncoder)

@munrojm
Copy link
Member

munrojm commented Jun 4, 2022

This is something I should be able to fix on my end in emmet-core. I'll report back when I have made the fix and patch released.

@janosh
Copy link
Member

janosh commented Aug 13, 2022

@sgbaird You may know this already but what I tend to do in this case is pass a custom handler to pd.to_json().

from emmet.core.provenance import ProvenanceDoc


def as_dict_handler(obj: object) -> dict[str, Any] | None:
    """Use as default_handler kwarg to json.dump() or pandas.to_json()."""
    try:
        return obj.as_dict()  # all MSONable objects implement as_dict()
    except AttributeError:
        if isinstance(obj, ProvenanceDoc):
            needed_attrs = ("foo", "bar", ...)
            return {k: obj[k] for k in needed_attrs}

        return None  # replace unhandled objects with None in serialized data

df.to_json("some-data.json.gz", default_handler=as_dict_handler)

@sgbaird
Copy link
Author

sgbaird commented Aug 13, 2022

@janosh, interesting. That's new to me. Thanks for the tip!

@mkhorton
Copy link
Member

@munrojm just wondering if there was an update on this issue?

I believe monty dumpfn/loadfn can serialize and de-serialize both pandas DataFrames and pydantic models, but I haven't actually verified both simultaneously. Seems like it'd be a common use case however.

@tschaume
Copy link
Member

@sgbaird Is this still an issue with the latest version of mp-api? If so, could you post a short snippet for me to reproduce? Thanks!

@sgbaird
Copy link
Author

sgbaird commented Jan 6, 2025

@tschaume unfortunately I'm not sure and probably won't be able to put a reproducer script together based on time constraints. This is the context in which I was using it: https://github.com/search?q=repo%3Asparks-baird%2Fmatbench-genmetrics+MPRester&type=code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants