Skip to content

Commit

Permalink
reference implementation
Browse files Browse the repository at this point in the history
  • Loading branch information
staffik committed Nov 5, 2024
1 parent c832bc4 commit 5eccab8
Showing 1 changed file with 98 additions and 21 deletions.
119 changes: 98 additions & 21 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,34 +107,27 @@ Splitting a shard's Flat State is performed in multiple steps:
snapshots and to reload Mem Tries.

### State Storage - State
// TODO Describe integration with cold storage once design is ready

Each shard’s Trie is stored in the `State` column of the database, with keys prefixed by `ShardUId`, followed by a node's hash.
This structure uniquely identifies each shard’s data. To avoid copying all entries under a new `ShardUId` during resharding,
we use a mapping strategy that allows child shards to access ancestor shard data without directly creating new entries.
a mapping strategy allows child shards to access ancestor shard data without directly creating new entries.

#### Mapping strategy and the database key structure
A naive approach to resharding would involve copying all `State` entries with a new `ShardUId` for a child shard, effectively duplicating the state.
This method, while straightforward, is not feasible because copying a large state would take too much time.
Resharding needs to appear complete between two blocks, so a direct copy would not allow the process to occur quickly enough.

The `DBCol::ShardUIdMapping` column facilitates this approach by linking each child shard’s `ShardUId`
to the `ShardUId` of the closest ancestor shard that holds the relevant data.
Initially, this column is empty, as existing shards map to themselves.
During a resharding event, a mapping entry is added to `ShardUIdMapping`, pointing each child shard’s `ShardUId` to the appropriate ancestor’s `ShardUId`.
This column remains compact and is cached by RocksDB, making lookups efficient on each access.
To address this, Resharding V3 employs an efficient mapping strategy, using the `DBCol::ShardUIdMapping` column
to link each child shard’s `ShardUId` to the closest ancestor’s `ShardUId` holding the relevant data.
This allows child shards to access and update state data under the ancestor shard’s prefix without duplicating entries.

This mapping logic is implemented within `TrieStoreAdapter` and `TrieStoreUpdateAdapter`, which are layers over the `Store` interface.
These adapters specifically handle interactions with the `State` column, applying the shard mapping logic during Trie-related database operations.
Although child shards are accessed with their own `ShardUId` at a high level,
these adapters check `ShardUIdMapping` to access data under the relevant ancestor’s `ShardUId` where necessary,
applying this behavior to both read and write operations.
Initially, `ShardUIdMapping` is empty, as existing shards map to themselves. During resharding, a mapping entry is added to `ShardUIdMapping`,
pointing each child shard’s `ShardUId` to the appropriate ancestor. Mappings persist as long as any descendant shard references the ancestor’s data.
Once a node stops tracking all children and descendants of a shard, the entry for that shard can be removed, allowing its data to be garbage collected.
For archival nodes, mappings are retained indefinitely to maintain access to the full historical state.

#### Mapping retention and cleanup

Mappings in `ShardUIdMapping` persist as long as any descendant shard still references the ancestor shard’s data.
When no descendants rely on a particular ancestor shard, its mapping entry can be removed, allowing the ancestor’s data to be garbage collected.
If a full state sync is performed for a child shard (downloading and storing all its data independently),
the mapping to the ancestor shard can also be removed, as the child’s data is now directly stored under its own `ShardUId`.

This approach allows efficient management of shard state during resharding events,
enabling seamless transitions without altering storage structures directly.
This mapping strategy enables efficient shard management during resharding events,
supporting smooth transitions without altering storage structures directly.


### Stateless Validation
Expand Down Expand Up @@ -163,6 +156,90 @@ enabling seamless transitions without altering storage structures directly.
The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]
```

### State Storage - State mapping

To enable efficient shard state management during resharding, Resharding V3 uses the `DBCol::ShardUIdMapping` column.
This mapping allows child shards to reference ancestor shard data, avoiding the need for immediate duplication of state entries.

#### Mapping application in adapters

The core of the mapping logic is applied in `TrieStoreAdapter` and `TrieStoreUpdateAdapter`, which act as layers over the general `Store` interface.
Here’s a breakdown of the key functions involved:

- **Key resolution**:
The `get_key_from_shard_uid_and_hash` function is central to determining the correct `ShardUId` for state access.
At a high level, operations use the child shard's `ShardUId`, but within this function,
the `DBCol::ShardUIdMapping` column is checked to determine if an ancestor `ShardUId` should be used instead.

```rust
fn get_key_from_shard_uid_and_hash(
store: &Store,
shard_uid: ShardUId,
hash: &CryptoHash,
) -> [u8; 40] {
let mapped_shard_uid = store
.get_ser::<ShardUId>(DBCol::StateShardUIdMapping, &shard_uid.to_bytes())
.expect("get_key_from_shard_uid_and_hash() failed")
.unwrap_or(shard_uid);
let mut key = [0; 40];
key[0..8].copy_from_slice(&mapped_shard_uid.to_bytes());
key[8..].copy_from_slice(hash.as_ref());
key
}
```

This function first attempts to retrieve a mapped ancestor `ShardUId` from `DBCol::ShardUIdMapping`.
If no mapping exists, it defaults to the provided child `ShardUId`.
This resolved `ShardUId` is then combined with the `node_hash` to form the final key used in `State` column operations.

- **State access operations**:
The `TrieStoreAdapter` and `TrieStoreUpdateAdapter` use `get_key_from_shard_uid_and_hash` to correctly resolve the key for both reads and writes.
Example methods include:

```rust
// In TrieStoreAdapter
pub fn get(&self, shard_uid: ShardUId, hash: &CryptoHash) -> Result<Arc<[u8]>, StorageError> {
let key = get_key_from_shard_uid_and_hash(self.store, shard_uid, hash);
self.store.get(DBCol::State, &key)
}

// In TrieStoreUpdateAdapter
pub fn increment_refcount_by(
&mut self,
shard_uid: ShardUId,
hash: &CryptoHash,
data: &[u8],
increment: NonZero<u32>,
) {
let key = get_key_from_shard_uid_and_hash(self.store, shard_uid, hash);
self.store_update.increment_refcount_by(DBCol::State, key.as_ref(), data, increment);
}
```
The `get` function retrieves data using the resolved `ShardUId` and key, while `increment_refcount_by` manages reference counts,
ensuring correct tracking even when accessing data under an ancestor shard.

#### Mapping retention and cleanup

Mappings in `DBCol::ShardUIdMapping` persist as long as any descendant relies on an ancestor’s data.
To manage this, the `set_shard_uid_mapping` function in `TrieStoreUpdateAdapter` adds a new mapping during resharding:
```rust
fn set_shard_uid_mapping(&mut self, child_shard_uid: ShardUId, parent_shard_uid: ShardUId) {
self.store_update.set(
DBCol::StateShardUIdMapping,
child_shard_uid.to_bytes().as_ref(),
&borsh::to_vec(&parent_shard_uid).expect("Borsh serialize cannot fail"),
)
}
```

When a node stops tracking all descendants of a shard, the associated mapping entry can be removed, allowing RocksDB to perform garbage collection.
For archival nodes, mappings are retained permanently to ensure access to the historical state of all shards.

This implementation ensures efficient and scalable shard state transitions,
allowing child shards to use ancestor data without creating redundant entries.



## Security Implications

```text
Expand Down

0 comments on commit 5eccab8

Please sign in to comment.