Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Robust Lifecycle Scripts #437

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions accepted/0000-robust-lifecycle-scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# {{TITLE: a human-readable title for this RFC!}}

## Summary

Lifecycle scripts need to have more context passed to them about why an event is being invoked. Due to the way packages are deduped and hoisted, not all use cases are properly covered by simply event names.

## Motivation

Lifecycle scripts such as `preinstall` `install` and `postinstall` are very commonly used, but difficult to reason about due to the way package trees are installed and updated. In the case of `install`, a package could be directly installed as a dependency by a user, installed as a global, indirectly installed as a dependency, or added when dependency versions changed to the point where a dependency could no longer be shared between two packages as a single version. Worst still, `uninstall` could be fired when the package won't be removed at all due to one dependant no longer requiring it, but another still needing to keep it.

An `uninstall` script could assume that the directory will be removed or not used in the future, and put the package into a bad state if it was merely deduped, for example. By providing scripts with more context, like whether a new directory is being made or destroyed, which depenants caused the installation, etc, we could properly cover all use-cases and remove confusion that would lead to destroyed data, bugs, and frustration.
Copy link
Contributor

@isaacs isaacs Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An `uninstall` script could assume that the directory will be removed or not used in the future, and put the package into a bad state if it was merely deduped, for example. By providing scripts with more context, like whether a new directory is being made or destroyed, which depenants caused the installation, etc, we could properly cover all use-cases and remove confusion that would lead to destroyed data, bugs, and frustration.
An `uninstall` script could assume that the directory will be removed or not used in the future, and put the package into a bad state if it was merely deduped, for example. By providing scripts with more context, like whether a new directory is being made or destroyed, which dependants caused the installation, etc, we could properly cover all use-cases and remove confusion that would lead to destroyed data, bugs, and frustration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use the American spellings of things, so dependent (for both noun and adjective) rather than dependant. Sorry, total nitpick, I know. (Still reading, will provide more substantive feedback as full review.)


Future versions of npm could use different reify engines and methods, potentially leading to more needed context for these scripts.

The primary reason for this RFC is the lack of `uninstall` lifecycle scripts in npm v7, and how difficult it would be to re-add them in a clear way.

See issue: #3042
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See issue: #3042
See: [#3042](https://github.com/npm/cli/issues/3042)


## Detailed Explanation

Lifecycle scripts need to be re-engineered and the old approach should be deprecated.

`package.json` will include a new index called `"lifecycle"`

```js
{
"lifecycle": {
"install": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having discrete points in the reify process is better than trying to support the pre/post paradigm like this. It was an inadequate model and confuses people a lot (i.e. "pre" means "it runs before the x scripts" not "it runs before x")

I'd prefer not having pre/post be baked in like this. If we have a need for something to run at a time, we put an event there and name it accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Also, I think it would be good to first make the list of the "event moments" we want to expose, and identify where each current scripts entry gets run. It would also give us a way to call out where the "no userland scripts allowed" phase of the process starts and ends (ie, everything between _loadTrees and _reifyPackages in lib/arborist/reify.js).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo "it runs before x" is the semantic every single person wants, and the other semantic is "silly sugar for &&".

"pre" "bin/life_install.sh --phase=pre ${dependant} ${depenant_path} ${reify-reason}"
"pre" "bin/life_install.sh --phase=current ${dependant} ${reify-reason}"
"pre" "bin/life_install.sh --phase=post ${dependant} ${reify-reason}"
Comment on lines +29 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"pre" "bin/life_install.sh --phase=pre ${dependant} ${depenant_path} ${reify-reason}"
"pre" "bin/life_install.sh --phase=current ${dependant} ${reify-reason}"
"pre" "bin/life_install.sh --phase=post ${dependant} ${reify-reason}"
"pre": "bin/life_install.sh --phase=pre ${dependent} ${dependent_path} ${reify_reason}",
"current": "bin/life_install.sh --phase=current ${dependent} ${reify_reason}",
"post": "bin/life_install.sh --phase=post ${dependent} ${reify_reason}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is running during the reserved phase, then we're in trouble.

Before diffing, we don't know what will actually be done. Before unpacking, we don't have the scripts on disk to run. After unpacking, it's arguably not "pre"-install. But a userland script can modify the folder tree (or even the dependency graph, if it writes to package.json files), so it invalidates the work already done. We'd have to keep reloading the trees and recalculating the diff if any changes were made. If we don't do that, then we have security problems. Better to just keep the "pre" to "here's what we were asked to do, we are about to start getting ready to figure out how to do it", and the "post" is "here's what we did, and it's all the way done now".

So something like:

{
  "lifecycle": {
    "beforeTreeChange": "_loadTrees has not yet been called",
    "afterTreeChange": "_reifyPackages has completed",
    "beforeSave": "have not yet written to package.json and package-lock.json",
    "afterSave": "completely wrote lockfile and manifest"
  }
}

If beforeTreeChange and afterTreeChange exposed the various arguments that we currently pass into arborist.reify() (add, rm, update, etc.) then people could use that to decide whether this is an uninstall they care about. If afterTreeChange also had a way to get the diff somehow (though I'm not sure the best way to do this), then they could likewise pick the replacement for scripts.uninstall that most suits their need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we'd be exposing implementation pretty directly, which is subject to change. For example, a module-map based solution might not use trees at all. Is there a way we could be more future-proof and still be accurate about the lifecycle? Do package maintainers need all of that granularity?

},
"publish": {
"pre": "prepublish" // default
}
}
}
```

By default, the lifecycle script sections will reference the old reserved scripts, maintaining compatibility.

## Rationale and Alternatives

We could solve this problem by simply adding more lifecycle scripts, to an extent.

We could solve this problem by clearly documenting exactly when when a lifecycle script happens, and dramatically limiting their scope.

We could have one lifecycle script that we pass a full arborist diff to and let the package maintainer deal with the changes.

This RFC provides a good balance of flexibility, allows the old lifecycle scripts to continue to exist, and solves the problem of lack of context.

## Implementation

### TODO:
* list all lifecycles
* provide lists of context keys
* describe how it could be expanded in the future
* describe defaults and default behavior
* describe effect on older packages
* describe future changes to arborists impact

## Prior Art

### TODO:
* is there prior art? What other build systems deal with lifecycles in this way?
* are there lessons to be learned from the likes of grunt, etc?

## Unresolved Questions and Bikeshedding

* will future changes to arborists or additional reify engines change things?
* can we provide a stable base for future needs?