Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding options to log addition additional metadata into the lockfile #204

Merged
merged 63 commits into from
Nov 22, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
d9b34a7
Adding options to log metadata about lockfile in comments to the lock…
Jun 30, 2022
cfda7bd
Address failing lints
Jun 30, 2022
604b9fe
Structuring metadata as pydantic models on the LockMeta class
Jul 5, 2022
f448f05
Restructuring pydantic a bit to have less redundancy
Jul 5, 2022
90875c3
Fixing new lints
Jul 5, 2022
f904e79
trying to kickoff tests again after timeout last time
Jul 5, 2022
e7c613c
Adding regression tests for new metadata fields
NatPRoach Jul 13, 2022
dd97f51
Merge branch 'main' into add_lockfile_metadata
mariusvniekerk Jul 28, 2022
811662b
Merge branch 'main' into add_lockfile_metadata
mariusvniekerk Jul 29, 2022
8f4c6af
Update conda_lock/src_parser/__init__.py
NatPRoach Jul 29, 2022
fe54b6b
Update conda_lock/src_parser/__init__.py
NatPRoach Jul 29, 2022
3820802
re-adding tests, fixing comments from conda-lock team
NatPRoach Aug 10, 2022
1fafc94
Getting changes passing precommit checks
NatPRoach Aug 10, 2022
67cf13b
Attempting to fix git test
NatPRoach Aug 11, 2022
842702a
Fixing lint
NatPRoach Aug 11, 2022
11dfd80
Fixing changing dir into non-existent dir
NatPRoach Aug 11, 2022
6f4b57a
Applying suggestion from code review
NatPRoach Aug 11, 2022
e1cd6fa
Fixing precommits after applying suggestions
NatPRoach Aug 11, 2022
7bb79ae
Reverting to relative paths
NatPRoach Aug 11, 2022
376b5d4
Fixing lints
NatPRoach Aug 11, 2022
81a9e9c
Reverting suggestion from code review
NatPRoach Aug 12, 2022
d9ea458
Fixing problem of always calculating inputs metadata breaking tests
Aug 14, 2022
0e3e51e
Fixing precommit checks
Aug 14, 2022
d91c42a
Actually fixing precommit checks
Aug 14, 2022
3f1ffc1
Merge branch 'main' into add_lockfile_metadata
mariusvniekerk Oct 10, 2022
b346860
Adding options to log metadata about lockfile in comments to the lock…
Jun 30, 2022
086d836
Address failing lints
Jun 30, 2022
07f03f8
Structuring metadata as pydantic models on the LockMeta class
Jul 5, 2022
1afe3c8
Restructuring pydantic a bit to have less redundancy
Jul 5, 2022
f3f7170
Fixing new lints
Jul 5, 2022
cddab4e
trying to kickoff tests again after timeout last time
Jul 5, 2022
5277b03
Adding regression tests for new metadata fields
NatPRoach Jul 13, 2022
d356a05
Update conda_lock/src_parser/__init__.py
NatPRoach Jul 29, 2022
4769460
Update conda_lock/src_parser/__init__.py
NatPRoach Jul 29, 2022
a0bfe28
re-adding tests, fixing comments from conda-lock team
NatPRoach Aug 10, 2022
93a6f3e
Getting changes passing precommit checks
NatPRoach Aug 10, 2022
9b7928c
Attempting to fix git test
NatPRoach Aug 11, 2022
27da398
Fixing lint
NatPRoach Aug 11, 2022
8ce4b30
Fixing changing dir into non-existent dir
NatPRoach Aug 11, 2022
b1361d7
Applying suggestion from code review
NatPRoach Aug 11, 2022
2917b0b
Fixing precommits after applying suggestions
NatPRoach Aug 11, 2022
6897d20
Reverting to relative paths
NatPRoach Aug 11, 2022
8833927
Fixing lints
NatPRoach Aug 11, 2022
7fafd4a
Reverting suggestion from code review
NatPRoach Aug 12, 2022
8cb4f44
Changing git commit hash to commit hash of most recently modified inp…
NatPRoach Nov 10, 2022
2ef27bc
Addressing PR comments, fixing associated tests
NatPRoach Nov 12, 2022
d003e43
Merge branch 'add_lockfile_metadata3' into add_lockfile_metadata
NatPRoach Nov 12, 2022
3426fcf
Fixing tests after merge
NatPRoach Nov 12, 2022
c1e36ba
Removing redundant gitpython req
NatPRoach Nov 12, 2022
107508f
Removing unused and duplicate click fields that got readded in merge
NatPRoach Nov 12, 2022
d871556
fixing path resolution when files are on different drives
NatPRoach Nov 12, 2022
ef3cd98
reworking git metadata check to work in abscence of pre-existing conf…
NatPRoach Nov 12, 2022
f5761de
Fixing precommit lints
NatPRoach Nov 12, 2022
a833d13
attempting to fix git metadata issue
NatPRoach Nov 12, 2022
4bee557
Move to the git directory to address the relative path issue
NatPRoach Nov 12, 2022
da86fac
address user section not existing on github actions
NatPRoach Nov 12, 2022
ae256e3
Fixing mistype on addressing user section not existing in github actions
NatPRoach Nov 12, 2022
941f0d8
Addressing releasing lock of config writer
NatPRoach Nov 12, 2022
1e910d6
Cleanup type annotations a bit
mariusvniekerk Nov 15, 2022
645512d
Fix potential pydantic pyright type inference issues
mariusvniekerk Nov 15, 2022
19199f1
Fix windows skip
mariusvniekerk Nov 15, 2022
9a5292a
Remove some unused imports
mariusvniekerk Nov 15, 2022
16e32c8
Update conda_lock/conda_lock.py
mariusvniekerk Nov 18, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions conda_lock/conda_lock.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,13 @@
from conda_lock.lookup import set_lookup_location
from conda_lock.src_parser import (
Dependency,
GitMeta,
InputMeta,
LockedDependency,
Lockfile,
LockMeta,
LockSpecification,
TimeMeta,
UpdateSpecification,
aggregate_lock_specs,
)
Expand Down Expand Up @@ -283,6 +286,11 @@ def make_lock_files(
filter_categories: bool = True,
extras: Optional[AbstractSet[str]] = None,
check_input_hash: bool = False,
add_inputs_metadata: bool = False,
add_git_metadata: bool = False,
add_time_metadata: bool = False,
metadata_jsons: Optional[List[pathlib.Path]] = None,
metadata_yamls: Optional[List[pathlib.Path]] = None,
) -> None:
"""
Generate a lock file from the src files provided
Expand Down Expand Up @@ -316,6 +324,16 @@ def make_lock_files(
Filter out unused categories prior to solving
check_input_hash :
Do not re-solve for each target platform for which specifications are unchanged
add_inputs_metadata:
If true add extra metadata to lockfile comments about the input files provided.
add_git_metadata:
If true add extra metadata to lockfile comments about the git-repo.
add_time_metadata:
If true add extra metadata to lockfile comments about the time of lockfile creation.
metadata_jsons:
JSON file(s) containing structured metadata to add to metadata section of the lockfile.
metadata_yamls:
YAML file(s) containing structured metadata to add to metadata section of the lockfile.
"""

# initialize virtual package fake
Expand Down Expand Up @@ -392,6 +410,11 @@ def make_lock_files(
platforms=platforms_to_lock,
lockfile_path=lockfile_path,
update_spec=update_spec,
add_git_metadata=add_git_metadata,
add_time_metadata=add_time_metadata,
src_files=src_files if add_inputs_metadata else None,
metadata_jsons=metadata_jsons,
metadata_yamls=metadata_yamls,
)

if "lock" in kinds:
Expand Down Expand Up @@ -716,13 +739,58 @@ def _solve_for_arch(
return list(conda_deps.values()) + list(pip_deps.values())


def convert_structured_metadata_yaml(in_path: pathlib.Path) -> Dict[str, Any]:
with in_path.open("r") as infile:
metadata = yaml.safe_load(infile)
return metadata


def convert_structured_metadata_json(in_path: pathlib.Path) -> Dict[str, Any]:
import json

with in_path.open("r") as infile:
metadata = json.load(infile)
return metadata


def update_metadata(to_change: Dict[str, Any], change_source: Dict[str, Any]) -> None:
for key in change_source:
if key in to_change:
logger.warning(
f"Custom metadata field {key} provided twice, overwriting value "
+ f"{to_change[key]} with {change_source[key]}"
)
to_change.update(change_source)


def get_custom_metadata(
metadata_jsons: Optional[List[pathlib.Path]] = None,
metadata_yamls: Optional[List[pathlib.Path]] = None,
) -> Dict[str, str]:
custom_metadata_dict: Dict[str, Any] = {}
if metadata_jsons is not None:
for json_path in metadata_jsons:
new_metadata = convert_structured_metadata_json(json_path)
update_metadata(custom_metadata_dict, new_metadata)
if metadata_yamls is not None:
for yaml_path in metadata_yamls:
new_metadata = convert_structured_metadata_yaml(yaml_path)
update_metadata(custom_metadata_dict, new_metadata)
return custom_metadata_dict


def create_lockfile_from_spec(
*,
conda: PathLike,
spec: LockSpecification,
platforms: List[str] = [],
lockfile_path: pathlib.Path,
update_spec: Optional[UpdateSpecification] = None,
add_git_metadata: bool = False,
add_time_metadata: bool = False,
metadata_jsons: Optional[List[pathlib.Path]] = None,
metadata_yamls: Optional[List[pathlib.Path]] = None,
src_files: Optional[List[pathlib.Path]] = None,
) -> Lockfile:
"""
Solve or update specification
Expand All @@ -745,13 +813,32 @@ def create_lockfile_from_spec(
for dep in deps:
locked[(dep.manager, dep.name, dep.platform)] = dep

git_metadata = GitMeta.create() if add_git_metadata else None
time_metadata = TimeMeta.create() if add_time_metadata else None
inputs_metadata: Optional[Dict[str, InputMeta]] = (
{str(src_file): InputMeta.create(src_file=src_file) for src_file in src_files}
NatPRoach marked this conversation as resolved.
Show resolved Hide resolved
if src_files is not None
else None
)
custom_metadata: Optional[Dict[str, str]] = (
get_custom_metadata(
metadata_jsons=metadata_jsons, metadata_yamls=metadata_yamls
)
if metadata_jsons is not None or metadata_yamls is not None
else None
)

return Lockfile(
package=[locked[k] for k in locked],
metadata=LockMeta(
content_hash=spec.content_hash(),
channels=[c for c in spec.channels],
platforms=spec.platforms,
sources=[str(source.resolve()) for source in spec.sources],
git_metadata=git_metadata,
time_metadata=time_metadata,
inputs_metadata=inputs_metadata,
custom_metadata=custom_metadata,
),
)

Expand Down Expand Up @@ -916,6 +1003,11 @@ def run_lock(
virtual_package_spec: Optional[pathlib.Path] = None,
update: Optional[List[str]] = None,
filter_categories: bool = False,
add_inputs_metadata: bool = False,
add_git_metadata: bool = False,
add_time_metadata: bool = False,
metadata_jsons: Optional[List[pathlib.Path]] = None,
metadata_yamls: Optional[List[pathlib.Path]] = None,
) -> None:
if environment_files == DEFAULT_FILES:
if lockfile_path.exists():
Expand Down Expand Up @@ -960,6 +1052,11 @@ def run_lock(
extras=extras,
check_input_hash=check_input_hash,
filter_categories=filter_categories,
add_inputs_metadata=add_inputs_metadata,
add_git_metadata=add_git_metadata,
add_time_metadata=add_time_metadata,
metadata_jsons=metadata_jsons,
metadata_yamls=metadata_yamls,
)


Expand Down Expand Up @@ -1085,6 +1182,35 @@ def main() -> None:
multiple=True,
help="Packages to update to their latest versions. If empty, update all.",
)
@click.option(
"--add-inputs-metadata",
is_flag=True,
help="If true add extra metadata to lockfile comments about the input files provided.",
)
@click.option(
"--add-git-metadata",
is_flag=True,
help="If true add extra metadata to lockfile comments about the git-repo.",
)
@click.option(
"--add-time-metadata",
is_flag=True,
help="If true add extra metadata to lockfile comments about the time of lockfile creation.",
)
@click.option(
"--metadata-jsons",
default=None,
multiple=True,
type=click.Path(),
help="JSON file(s) containing structured metadata to add to metadata section of the lockfile.",
)
@click.option(
"--metadata-yamls",
default=None,
multiple=True,
type=click.Path(),
help="YAML file(s) containing structured metadata to add to metadata section of the lockfile.",
)
@click.option(
"--pypi_to_conda_lookup_file",
type=str,
Expand Down Expand Up @@ -1112,6 +1238,11 @@ def lock(
virtual_package_spec: Optional[PathLike],
pypi_to_conda_lookup_file: Optional[str],
update: Optional[List[str]] = None,
add_inputs_metadata: bool = False,
add_git_metadata: bool = False,
add_time_metadata: bool = False,
metadata_jsons: Optional[List[pathlib.Path]] = None,
metadata_yamls: Optional[List[pathlib.Path]] = None,
) -> None:
"""Generate fully reproducible lock files for conda environments.

Expand All @@ -1127,6 +1258,10 @@ def lock(
"""
logging.basicConfig(level=log_level)

if metadata_jsons is not None:
metadata_jsons = [pathlib.Path(path) for path in metadata_jsons]
if metadata_yamls is not None:
metadata_yamls = [pathlib.Path(path) for path in metadata_yamls]
# Set Pypi <--> Conda lookup file location
if pypi_to_conda_lookup_file:
set_lookup_location(pypi_to_conda_lookup_file)
Expand Down Expand Up @@ -1175,6 +1310,11 @@ def lock(
virtual_package_spec=virtual_package_spec,
update=update,
filter_categories=filter_categories,
add_inputs_metadata=add_inputs_metadata,
add_git_metadata=add_git_metadata,
add_time_metadata=add_time_metadata,
metadata_jsons=metadata_jsons,
metadata_yamls=metadata_yamls,
)
if strip_auth:
with tempfile.TemporaryDirectory() as tempdir:
Expand Down
85 changes: 85 additions & 0 deletions conda_lock/src_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import hashlib
import json
import logging
import pathlib
import typing

Expand All @@ -16,6 +17,9 @@
from conda_lock.virtual_package import FakeRepoData


logger = logging.getLogger(__name__)


class StrictModel(BaseModel):
class Config:
extra = "forbid"
Expand Down Expand Up @@ -101,6 +105,66 @@ def validate_hash(cls, v: HashModel, values: Dict[str, typing.Any]) -> HashModel
return v


class TimeMeta(StrictModel):
"""Stores information about when the lockfile was generated."""

created_at: str = Field(..., description="Time stamp of lock-file creation time")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-scope proposal:

I wanted to throw out this idea in case anyone else is interested in helping me tackle this idea in a subsequent PR... (I'm fully in favor of the current created_at implementation.)

The disadvantage to created_at is that if you run conda-lock again 1 minute later, then probably the only change in the lockfile will be to the created_at field. Practically, knowing the exact minute that the lock was performed is superfluous, and it breaks idempotence.

In order to restore idempotence, you could do (pseudocode)

lock_time = max(pkg.upload_date for pkg in locked_versions)

and this lock_time represents the earliest possible instant when the lockfile could have theoretically been generated. Indeed, it depends only on the metadata of the locked packages themselves, so we could even consider enabling this by default.


@classmethod
def create(cls) -> "TimeMeta":
import time

return cls(created_at=datetime.datetime.utcnow().isoformat() + 'Z')
NatPRoach marked this conversation as resolved.
Show resolved Hide resolved


class GitMeta(StrictModel):
"""
Stores information about the git repo the lockfile is being generated in (if applicable) and
the git user generating the file.
"""

git_user_name: str = Field(..., description="Git user.name field of global config")
git_user_email: str = Field(
..., description="Git user.email field of global config"
)
git_sha: Optional[str] = Field(
default=None, description="sha256 hash of the most recent git commit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default=None, description="sha256 hash of the most recent git commit"
default=None, description="sha256 hash of the current git commit"

Copy link
Contributor

@maresb maresb Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this comment, since the hash is already computed with repo.head.object.hexsha, I think it's just the description which needs to be fixed.

)

@classmethod
def create(cls) -> "GitMeta":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have the git metadata associated with the HEAD commit for the relevant repo or should it be associated with the most recent commit for the input files.

The latter would mean that if the components of a lock don't change then the git_meta is unaffected

import git

git_sha: Optional[str]
try:
repo = git.Repo(search_parent_directories=True)
git_sha = f"{repo.head.object.hexsha}{'-dirty' if repo.is_dirty() else ''}"
except git.exc.InvalidGitRepositoryError:
git_sha = None
return cls(
git_user_name=git.Git()().config("user.name"),
git_user_email=git.Git()().config("user.email"),
git_sha=git_sha,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if there were some way to include the SHA without also including my Git name/email.

I anticipate more metadata flags, so I'm wondering if we can make this more fine-grained and modular.

I have a similar but failed PR in which I implemented metadata selection via a single CSV argument like --metadata=created_by,comment,command,timestamp. I'm no design expert, so I'm not sure if this would be an improvement, but it may help with preventing a flood of options when running conda-lock lock --help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably having the metadata selector fields behave similar to how channels work. That jives with how the rest of the ecosystem functions. So something like

cona-lock --md created-by --md comment --md command .... 

Fits most cleanly and works easily with click/typer which is what we're using for the cli



class InputMeta(StrictModel):
"""Stores information about an input provided to generate the lockfile."""

md5: str = Field(..., description="md5 checksum for an input file")
NatPRoach marked this conversation as resolved.
Show resolved Hide resolved

@classmethod
def create(cls, src_file: pathlib.Path) -> "InputMeta":
return cls(md5=f"{cls.get_input_md5(src_file=src_file)}")

@staticmethod
def get_input_md5(src_file: pathlib.Path) -> str:
hasher = hashlib.md5()
with src_file.open("r") as infile:
hasher.update(infile.read().encode("utf-8"))
return hasher.hexdigest()


class LockMeta(StrictModel):
content_hash: Dict[str, str] = Field(
..., description="Hash of dependencies for each target platform"
Expand All @@ -113,6 +177,23 @@ class LockMeta(StrictModel):
...,
description="paths to source files, relative to the parent directory of the lockfile",
)
time_metadata: Optional[TimeMeta] = Field(
None, description="Metadata dealing with the time lockfile was created"
)
git_metadata: Optional[GitMeta] = Field(
None,
description=(
"Metadata dealing with the git repo the lockfile was created in and the user that created it"
),
)
inputs_metadata: Optional[Dict[str, InputMeta]] = Field(
None,
description="Metadata dealing with the input files used to create the lockfile",
)
custom_metadata: Optional[Dict[str, str]] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose I want to include something like environment.yml which contains lists. Shall we make this a fully general dict?

(Needs from typing import Any)

Suggested change
custom_metadata: Optional[Dict[str, str]] = Field(
custom_metadata: Optional[Dict[Any, Any]] = Field(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally not. It places a lot of extra strain on other implemations that make use of the file format like micromamba. Any is FAR too broad a type. I can potentially see value in a Union[str, list[str]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Ideally it would be some sort of recursive type like T: Union[int, str, list[T], dict[str,T]], but Pydantic chokes on that already.

Note how requirements must be at least list[Union[str, dict[str, list[str]]]] due to ["pip", {"pip": ["pypi-dep"]}]. Thus to parse most requirements.txt we'd need at least something like

dict[str, Union[str, list[Union[str, dict[str, list[str]]]]]]

(assuming I didn't make any mistakes).

Given the complexity, I've convinced myself that this should be a separate feature. But in this case, we should probably make it clear in the --help that metadata-jsons/metadata-yamls contents must be dict[str,str].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support dict[str, Union[str, int, float]]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm getting at here is, if we have

# md.yaml
x: a
y: 1
z: 1.5

is it correct to have the current result of conda-lock --metadata-yaml md.yaml be as follows?

  custom_metadata:
    x: a
    y: '1'
    z: '1.5'

Perhaps that's the simplest since strings faithfully encode most reasonable types, and it prevents other potential headaches.

None,
description="Custom metadata provided by the user to be added to the lockfile",
)

def __or__(self, other: "LockMeta") -> "LockMeta":
"""merge other into self"""
Expand All @@ -126,6 +207,10 @@ def __or__(self, other: "LockMeta") -> "LockMeta":
channels=self.channels,
platforms=sorted(set(self.platforms).union(other.platforms)),
sources=ordered_union([self.sources, other.sources]),
time_metadata=other.time_metadata,
git_metadata=other.git_metadata,
inputs_metadata=other.inputs_metadata,
NatPRoach marked this conversation as resolved.
Show resolved Hide resolved
custom_metadata=other.custom_metadata,
NatPRoach marked this conversation as resolved.
Show resolved Hide resolved
)

@validator("channels", pre=True, always=True)
Expand Down
4 changes: 3 additions & 1 deletion conda_lock/src_parser/lockfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,9 @@ def write_section(text: str) -> None:
yaml.dump(
{
"version": Lockfile.version,
**json.loads(content.json(by_alias=True, exclude_unset=True)),
**json.loads(
content.json(by_alias=True, exclude_unset=True, exclude_none=True)
),
},
f,
)
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ pydantic
poetry
ruamel.yaml
typing-extensions
gitpython
Loading