Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding options to log addition additional metadata into the lockfile #204

Merged
merged 63 commits into from
Nov 22, 2022

Conversation

NatPRoach
Copy link
Contributor

@NatPRoach NatPRoach commented Jun 30, 2022

Hello! thank you for making conda-lock, it's a great tool.

This PR would add the ability to specify extra comment lines into the lockfile as well as info about git repository etc. I added these metadata lines as a series of new flags:

    --metadata-jsons
    --metadata-yamls
    --metadata {"timestamp" "git_sha" "git_user_name" "git_user_email" "input_md5" "input_sha"}

--metadata can be specified multiple times to request any of the individual metadata tags, as requested here:
#204 (comment)

@netlify
Copy link

netlify bot commented Jun 30, 2022

Deploy Preview for conda-lock ready!

Name Link
🔨 Latest commit 16e32c8
🔍 Latest deploy log https://app.netlify.com/sites/conda-lock/deploys/6377aa7aab825b000b655f24
😎 Deploy Preview https://deploy-preview-204--conda-lock.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@NatPRoach NatPRoach changed the title Adding options to log metadata about lockfile in comments to the lockfile Adding options to log metadata about lockfile in comments in the lockfile Jun 30, 2022
@NatPRoach NatPRoach marked this pull request as draft June 30, 2022 18:29
@mariusvniekerk
Copy link
Collaborator

Yeah having the metadata exist as a nice structured pydantic model with the various fields being Optional sounds great to me. When converting the model to raw python dictionary you can easily just exclude the fields with None values

@NatPRoach
Copy link
Contributor Author

Not sure what to do about these timeouts in the tests, but it seems like the ubuntu test that failed in f904e79 passed in 90875c3, and the mac os test that failed in 90875c3 passed in f904e79, despite there being no diffs between those commits. Happy to modify the timeout limit if that's what you'd recommend.

@NatPRoach NatPRoach marked this pull request as ready for review July 5, 2022 22:10
@mariusvniekerk
Copy link
Collaborator

Can you add some tests to assert that the kind of metadata output we would want remains in the files. These kinds of regression tests are invaluable to ensure that conda-lock has minimal regressions over time.

@NatPRoach
Copy link
Contributor Author

Absolutely. Just wanted to make sure that the approach looked reasonable to you before moving forward on writing tests. I'll try to get that done ASAP.

@NatPRoach
Copy link
Contributor Author

I'm not really sure what's going on with the test that's failing. I didn't change anything related to the code for updating dependencies, and that test is failing for me locally when run on main as well.

@NatPRoach NatPRoach force-pushed the add_lockfile_metadata branch from 70173a3 to e7c613c Compare July 14, 2022 16:03
@NatPRoach
Copy link
Contributor Author

@mariusvniekerk I think this is passing all CI except the test that seems to be failing on main / other branches as well. Happy to add more tests if you'd like, but if you wouldn't mind taking another look I'd really appreciate it.

@mariusvniekerk
Copy link
Collaborator

I landed #226 which should hopefully make the test suite here a bit more reliable

@mariusvniekerk
Copy link
Collaborator

@wolfv Would this change impact micromamba in any way?

@wolfv
Copy link

wolfv commented Jul 29, 2022

Let's ask @Klaim who implemented the lock-file handling in micromamba!

Copy link
Member

@jezdez jezdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits with the formats and using md5 alone for the inputs metadata.

def create(cls) -> "TimeMeta":
import time

return cls(created_at=f"{time.asctime(time.gmtime(time.time()))}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use an ISO 8601 formatted value here instead of asctime which is awkward to parse programmatically?

conda_lock/src_parser/__init__.py Outdated Show resolved Hide resolved
@Klaim
Copy link

Klaim commented Jul 29, 2022

Hello!

@wolfv Would this change impact micromamba in any way?

New fields in "metadata" will be ignored by micromamba (assuming I didn't miss a bug), see details of the parsing in that function. It will only look for specific fields it needs.

osx-64: d01c1f5433f30bdbcd3bdad8d9b096774ab55f1210c094acdc61a35b32b28d67
win-64: 310b23581083bfb983927c40d3bdc86162192d7b26ffd7bffc385f627c155697
inputs_metadata:
/Users/c13371nroach/Documents/repos/conda-lock/tests/zlib/environment.yml:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a relative path not an absolute one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (and I will x2 check this) That it uses a relative path if provided a relative path when conda-lock is invoked, and an absolute one if passed an absolute path. Happy to make the path relative (perhaps to the lockfile?) if you think that's always preferable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is generally preferable. Leaking in usernames etc into the lockfiles seems undesirable

conda_lock/src_parser/__init__.py Outdated Show resolved Hide resolved
conda_lock/src_parser/__init__.py Outdated Show resolved Hide resolved
conda_lock/conda_lock.py Outdated Show resolved Hide resolved
@NatPRoach NatPRoach changed the title Adding options to log metadata about lockfile in comments in the lockfile Adding options to log addition additional metadata into the lockfile Nov 12, 2022
@NatPRoach
Copy link
Contributor Author

NatPRoach commented Nov 12, 2022

@mariusvniekerk @maresb
I believe I've addressed the remaining comments you had. Sorry for the delay on that. Please let me know if there's anything else you'd like to see to get this merged.

@NatPRoach NatPRoach requested review from mariusvniekerk and removed request for jezdez November 12, 2022 21:28
@mariusvniekerk
Copy link
Collaborator

Thanks! Let me review this quickly

@mariusvniekerk
Copy link
Collaborator

We probably don't need --metadata-json and --metadata-yaml as json is a subset of yaml

default=None,
description="Metadata dealing with the input files used to create the lockfile",
)
custom_metadata: Optional[Dict[str, str]] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose I want to include something like environment.yml which contains lists. Shall we make this a fully general dict?

(Needs from typing import Any)

Suggested change
custom_metadata: Optional[Dict[str, str]] = Field(
custom_metadata: Optional[Dict[Any, Any]] = Field(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally not. It places a lot of extra strain on other implemations that make use of the file format like micromamba. Any is FAR too broad a type. I can potentially see value in a Union[str, list[str]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Ideally it would be some sort of recursive type like T: Union[int, str, list[T], dict[str,T]], but Pydantic chokes on that already.

Note how requirements must be at least list[Union[str, dict[str, list[str]]]] due to ["pip", {"pip": ["pypi-dep"]}]. Thus to parse most requirements.txt we'd need at least something like

dict[str, Union[str, list[Union[str, dict[str, list[str]]]]]]

(assuming I didn't make any mistakes).

Given the complexity, I've convinced myself that this should be a separate feature. But in this case, we should probably make it clear in the --help that metadata-jsons/metadata-yamls contents must be dict[str,str].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support dict[str, Union[str, int, float]]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm getting at here is, if we have

# md.yaml
x: a
y: 1
z: 1.5

is it correct to have the current result of conda-lock --metadata-yaml md.yaml be as follows?

  custom_metadata:
    x: a
    y: '1'
    z: '1.5'

Perhaps that's the simplest since strings faithfully encode most reasonable types, and it prevents other potential headaches.

conda_lock/conda_lock.py Outdated Show resolved Hide resolved
@maresb
Copy link
Contributor

maresb commented Nov 18, 2022

In the near future someone may propose a command-line option to add a key-value pair to the custom metadata.

Perhaps it should be called something like --add-metadata which could be used like

conda-lock --metadata git_sha --add-metadata short_git_sha=$(git rev-parse --short HEAD)

Are we satisfied with using --metadata to select these pre-defined fields defined in this PR? Or would it be better to rename --metadata to something more specific like --metadata-opt?

@mariusvniekerk
Copy link
Collaborator

probably sufficient as the first draft. I want to make some tweaks to the models a little bit in a subsequent pr. And we can get the cli cleaned up in another one.

@mariusvniekerk mariusvniekerk merged commit e1feddd into conda:main Nov 22, 2022
riccardoporreca added a commit to riccardoporreca/conda-lock that referenced this pull request Feb 4, 2023
* Fixes conda#229, relying on what introduced in conda#204 for the input metadata, by explicitly handling only the case of different drives for keeping absolute paths, addressing conda#175.
maresb pushed a commit that referenced this pull request Feb 25, 2023
* Fixes #229, relying on what introduced in #204 for the input metadata, by explicitly handling only the case of different drives for keeping absolute paths, addressing #175.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants