- Supply-chain attacks are becoming a significant threat as more and more critical infrastructure depends on open-source code. The XZ attack is a quite recent example of a very sophisticated trojan horse injection attempt that could very well have succeeded
- This is even more a critical issue in the context of decentralised blockchains and cryptocurrencies: Compromising any piece of software could lead to stolen keys and therefore money, and as the name "supply chain" implies, it's not only and not mainly the software we produce which is at risk, but also the software we depend on, ie. the 100s of dependencies we pull in to build our stuff
- And this risk does not lie only within the core components that are running the network (eg. cardano-node) but also in every other tool that manipulates critical information (keys): wallets, command-line tools, monitoring systems, etc.
- the fact that 100% of the code is hosted by a single, centralised entity (GitHub) also implies that our software is liable to any compromission affecting GH.
- While we cannot fend off 100% of all attacks, we can at least provide verifiable information about the provenance of the software and its whole chain of dependencies so that users of the software can independently verify the software they are using is indeed the one they expect, built by the right persons and organisations
- GH provides a lot of tools to safeguard development process and recently introduced support of SLSA which is a specification to describe and attest the provenance of some piece of software
- Rather than relying on a centralised authority to store those certificates, we propose to leverage GH for producing SLSA attestations, and/or other similar kind of certificates, but host those directly on the chain in order to build a verifiable and immutable graph of dependencies attesting the origin of any piece of software
- This on-chain attestation will enable anyone connected to Cardano network through a node to:
- lookup and verify various pieces of information about a piece of software, such as its version, the signatories for its release, its dependency graph, its canonical URL, and the SHA of various released artifacts
- lookup and verify the history of this software releases and the changes that occured to its team of contributors
- recursively verify provenance of dependencies this software depends on
- Dogfood the development of Cardano by using Cardano
- Increase the level of security of the network by providing an audit trail of changes occuring to core components of the network, in particular safeguarding the system against unexpected and unwanted takeover of key source code
- Safely automate infrastructure upgrades: When managing software system at scale, manually upgrading is tedious and error prone task. By publishing releases and their provenance path on-chain, it becomes safe to automatically upgrade some components
- Track progress of tools and services upgrades before or during hard-forks
- Secure or at least pinpoint potential issues in the supply-chain of the network by identifying critical 3rdparty components
- Identify contributors, both organisations and individuals, on-chain making it possible to provide financial support through the chain itself (eg. possibly coupled with something like Drips?)
- Increase the security of provisioning tools, eg. cardano-up or yaci-devkit by providing a way for users to validate installed software
- Allow on-chain Endorsement of published software and versions, providing a decentralised and community owned measure of reliability and "fitness-for-purpose"
- A Component (any piece of software that we want to track the provenance of) is identified on-chain by a dedicated token
- Each Release of this component is represented on-chain by a single UTxO which consists in the following parts:
- Address: A shared script which implements on-chain controls (more on this later) as a Plutus smart contract
- Datum: Metadata about the released component, notably the Bill of Material or BoM (more on this later)
- Value: Unique token
- Releasing a new version of the component is materialised by a Transaction:
- The transaction is signed by a quorum (τ) of identified contributors
- It consumes the previous UTxO and creates a new one with updated metadata, subject to rules provided by the provenance script
- It optionally uses reference inputs to identify dependencies with an on-chain representation which is part of the Bill of Material and attestation attached in the output's datum
The following diagram illustrates a simple flow of PoP transactions:
- An initial version is created for a component A through a multisigned transaction
- When v2 is released, a new transaction is published which consumes the initial version, references B at version v2, and updates the data for version v2 of A. Note the transaction is signed by a subset of the initial signatories
- All UTxO representing current versions are held by the same script denoted
PoP
The Proof-of-Provenance script controls the transformation of a component's on-chain data.
This on-chain data comprises:
- A version following SemVer specification,
- A canonical URL (
$\mu$ ) for the source code repository, which can be in any protocol acceptable by git, - A commit hash (
$H$ ) pointing at the exact revision the release has been made at, - A set of public key hashes (
$\Pi = {\pi_1, \pi_2, \dots}$ ) for all valid maintainers of the component, - A quorum (
$\tau$ ) of signatories required for a valid release, - A Merkle Tree root hash (
$\Delta$ ) denoting the Bill-of-Material for this release, constructed from the list of dependencies hashes (more on this later), - A list of artifact hashes (
$\alpha_1, \alpha_2, \dots$ ) contained in this release.
The PoP script enforces the following rules when part of a transaction
- The transaction must be signed by a quorum of signatories
$\Sigma(T) \subseteq \Pi \wedge|\Sigma(T)| \geq \tau$
- If the signatories, quorum or canonical URL are modified, the transaction must be signed by all signatories
$\Sigma(T) = \Pi$
- If other components are referenced, the redeemer of the PoP script must contain, for each such component, a proof of inclusion in the BoM Merkle-tree, eg. a valid path in the Merkle-tree for one of the referenced artifacts.
- Let
${\iota_1, \iota_2 \dots}$ be the list of reference inputs to other components, and$R = {\rho_1, \rho_2 \dots}$ the redeemer forPoP
script, - for each
$i \in {1, 2, \dots n}$ ,$\exists \alpha_j^i \in \alpha(\iota_i)$ such that$(\pi_i, \alpha_j^i) \in \Delta.$
- Let
- The transaction produces one or two PoP outputs, depending on the version changes:
- In case of a minor or patch change, there must be a single output,
- In case of a major change, there may be two outputs, where one of the outputs is unchanged.
Note
The rationale behind the rules for handling versions is the following:
- Let's say there's a component
A
at version1.0
that depends onB
at version1.0
, which means there's been a transaction producingA
's UTxO referencing[email protected]
's UTxO - If
B
is upgraded at version1.1
this means the UTxO[email protected]
is no longer available for referencing, but of course[email protected]
is not affected - However, if
A
is upgraded to1.1
then it must also upgrade its dependency onB
to1.1
because there's no more[email protected]
to reference - This is a way to "force" dependents to upgrade to newer minor versions which might be a good idea in general, for example to ensure security patches are applied
PoP
script is only able to attest the validity of transactions that consumes and produces on-chain data. Whether or not this information is valid requires off-chain validation of the content of the data controlled by PoP
scripts.
Using on-chain data, an observer should be able to verify the following:
- The set of signatories' public key hashes recorded in the repository at the URL and commit pointed at by the UTxO matches
$\Pi$ , - The hashes of released artifacts match the list
$\alpha_1, \alpha_2 \dots$ provided in the UTxO, - The effective Bill of Material needed to produce the released artifacts forms a Merkle-tree whose root matches
$\Delta$ .
How the signatories' keys are recorded in the repository is an "implementation detail" and left unspecified. A possible "standard" solution is to use the CODEOWNERS
from GitHub which controls some aspects of the GitHub flow, for example as a comment.
# Code owners are automatically assigned to review PRs
#
# Later rules override earlier rules.
#
# These are the
# @alice: BA6DF0F948BEB071EB8A9E4B6D3F560F1518404492767CCD36C7B47BD6294A44
# @bob: FCF3B7093F9E8D41D9165E9C4381E238AC46A7AF4D39421A7482BCC6F7D71CAB
# @charlie: 798130FA96A82E10ED0392981D7C4DAA3572AC04416BCFB09F55856F05C7DCA4
# CICD
.github/ @alice
nix/ @bob
# Haskell components
command-line/ @charlie
core/ @charlie
# JavaScript interface
jsapi/ @alice
jsbits/ @alice
There's unfortunately no single widely used Software Bill of Material specification:
- SPDX1 is a standard supported by the Linux foundation
- CycloneDX2 is promoted by OWASP
- SWID3 is an ISO standard
- SLSA4 is not a SBOM standard but is supported out-of-the-box by GitHub
For the purpose of Proof of Provenance and on-chain verification, the off-chain SBOM must provide the list of dependencies (possibly tracked on-chain) used to produce the artifacts of the component as binary artifacts from which a hash value can be derived to construct a Merkle-tree.