Patch state non-deterministically failing on underpowered machines #8328

ChaoticTempest · 2023-01-11T00:21:34Z

Describe the bug
Patch state for sandbox is failing when being ran concurrently with a large number of sandbox nodes.
The following error message is presented when the test fails. The test was patching the registrar account from testnet into the sandbox node.

Error: Failed to query access key: handler error: [Access key for public key ed25519:5WMgq6gKZbAr7xBZmXJHjnj4C3UZkNJ4F5odisUBFcRh has never been observed on the node]

This seems to happen on machines that are under powered such as a CI pipeline. Running the same tests locally (on a macbook M1) works fine.

This issue seems related to sharding, as the code for patching state seems to have been moved with it when it was added. Also with the relevant code bit here being a candidate suspect:

nearcore/chain/chain/src/chain.rs

Lines 3738 to 3740 in 52087fb

    
           // XXX: This is a bit questionable -- sandbox state patching works 
        
           // only for a single shard. This so far has been enough. 
        
           let state_patch = state_patch.take();

To Reproduce
This is a bit hard to reproduce as we need a somewhat underpowered machine running a lot of tests at once. This was first noticed while being ran with PR tests on aurora-is-near/aurora-eth-connector. They have at least 36 tests with each of the tests spinning up their own separate sandbox node, calling into patch state at least once. Most of the tests do pass, but like 1 or 2 sometimes fail. It feels like there's data contention somewhere related to patching state.

Expected behavior
All tests pass as normal.

Version (please complete the following information):

nearcore: master as of Jan 4
sandbox

Additional context
First reported in near/near-workspaces-rs#253

The text was updated successfully, but these errors were encountered:

frol · 2023-06-29T11:12:57Z

@ChaoticTempest It seems that fast-forwarding can get stuck: near/near-workspaces-rs#266 (comment), do you think it can be related?

ChaoticTempest mentioned this issue Jan 11, 2023

fix: port collision on large amount of tests near/near-workspaces-rs#257

Merged

frol mentioned this issue Jun 29, 2023

fast_forward sometimes randomly hangs forever near/near-workspaces-rs#266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch state non-deterministically failing on underpowered machines #8328

Patch state non-deterministically failing on underpowered machines #8328

ChaoticTempest commented Jan 11, 2023

frol commented Jun 29, 2023

Patch state non-deterministically failing on underpowered machines #8328

Patch state non-deterministically failing on underpowered machines #8328

Comments

ChaoticTempest commented Jan 11, 2023

frol commented Jun 29, 2023