You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Patch state for sandbox is failing when being ran concurrently with a large number of sandbox nodes.
The following error message is presented when the test fails. The test was patching the registrar account from testnet into the sandbox node.
Error: Failed to query access key: handler error: [Access key for public key ed25519:5WMgq6gKZbAr7xBZmXJHjnj4C3UZkNJ4F5odisUBFcRh has never been observed on the node]
This seems to happen on machines that are under powered such as a CI pipeline. Running the same tests locally (on a macbook M1) works fine.
This issue seems related to sharding, as the code for patching state seems to have been moved with it when it was added. Also with the relevant code bit here being a candidate suspect:
// XXX: This is a bit questionable -- sandbox state patching works
// only for a single shard. This so far has been enough.
let state_patch = state_patch.take();
To Reproduce
This is a bit hard to reproduce as we need a somewhat underpowered machine running a lot of tests at once. This was first noticed while being ran with PR tests on aurora-is-near/aurora-eth-connector. They have at least 36 tests with each of the tests spinning up their own separate sandbox node, calling into patch state at least once. Most of the tests do pass, but like 1 or 2 sometimes fail. It feels like there's data contention somewhere related to patching state.
Expected behavior
All tests pass as normal.
Version (please complete the following information):
Describe the bug
Patch state for sandbox is failing when being ran concurrently with a large number of sandbox nodes.
The following error message is presented when the test fails. The test was patching the
registrar
account from testnet into the sandbox node.This seems to happen on machines that are under powered such as a CI pipeline. Running the same tests locally (on a macbook M1) works fine.
This issue seems related to sharding, as the code for patching state seems to have been moved with it when it was added. Also with the relevant code bit here being a candidate suspect:
nearcore/chain/chain/src/chain.rs
Lines 3738 to 3740 in 52087fb
To Reproduce
This is a bit hard to reproduce as we need a somewhat underpowered machine running a lot of tests at once. This was first noticed while being ran with PR tests on aurora-is-near/aurora-eth-connector. They have at least 36 tests with each of the tests spinning up their own separate sandbox node, calling into patch state at least once. Most of the tests do pass, but like 1 or 2 sometimes fail. It feels like there's data contention somewhere related to patching state.
Expected behavior
All tests pass as normal.
Version (please complete the following information):
Additional context
First reported in near/near-workspaces-rs#253
The text was updated successfully, but these errors were encountered: