Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Committee: Task Errors & Rescheduling #1

Merged
merged 2 commits into from
Feb 4, 2025

Conversation

zadykian
Copy link
Member

@zadykian zadykian commented Jan 26, 2025

Sync Committee: Task Errors & Rescheduling

This PR provides several changes of task error handling and propagation

  • Instead of plain error text, TaskExecError is returned from the executor (prover / proof_provider) as a part of TaskResult. Thus, it is now possible to distinguish critical errors from non-critical ones and handle them differently.

  • TaskStorage.ProcessTaskResult(...) method is revorked to support new error system. In case of a non-critical error, task is reset with an increased RetryCount.

  • ProofProvider now doesn't propagate non-critical error to the parent task (check TaskStateChangeHandler);

  • Prover's TaskHandler:

    • Now tracks kernel signals and return codes of the child process (proof-producer);
    • Maps error returned from handleImpl(...) to TaskExecErr type;

Other changes:

  • TaskScheduler now validates received result against task entry before OnTaskTerminated call;

  • Metrics in TaskStorage are being published only after successful tx commit in ProcessTaskResult and RescheduleHangingTasks methods;

  • Declared commonStorage type reused by TaskStorage, BlockStorage and TaskResultStorage;

  • client.Client interface is replaced with api.TracerRpcClient which has a narrowed method set (only those that are actually used by tracer);

Copy link

Copy link

Task Retry Policy

@zadykian zadykian force-pushed the sync-committee/tasks-rescheduling branch 2 times, most recently from 82ccb92 to 3f406e3 Compare January 29, 2025 10:54
@zadykian zadykian force-pushed the sync-committee/tasks-rescheduling branch from 3f406e3 to f8b8779 Compare January 29, 2025 11:03
@zadykian zadykian force-pushed the sync-committee/tasks-rescheduling branch from f8b8779 to ae911b8 Compare January 29, 2025 11:35
@zadykian zadykian changed the base branch from main to sync-committee/tasks-refactoring January 29, 2025 11:35
@zadykian zadykian force-pushed the sync-committee/tasks-refactoring branch from b59ac46 to cbdb324 Compare January 29, 2025 12:22
@zadykian zadykian force-pushed the sync-committee/tasks-rescheduling branch from ae911b8 to 5c0cbde Compare January 29, 2025 12:23
@zadykian zadykian changed the title Sync Committee: Task Rescheduling Sync Committee: Task Errors & Rescheduling Jan 29, 2025
@zadykian zadykian requested a review from akokoshn January 29, 2025 12:44
Base automatically changed from feature/rpc-errors to main January 31, 2025 13:24
@zadykian zadykian force-pushed the sync-committee/tasks-rescheduling branch from 285c6b3 to 29778d6 Compare January 31, 2025 16:29
@akokoshn akokoshn requested a review from oclaw February 3, 2025 07:36
khannanov-nil added a commit that referenced this pull request Feb 4, 2025
* Declared `TaskExecError` with `TaskErrType` enum, integrated it into `TaskResult`;
* `TaskStorage.ProcessTaskResult`: resetting task for later re-execution in case of non-critical error;
* `TaskStateChangeHandler (proof_provider)`: do not propagate non-critical error to the parent task;
* `TaskHandler (prover)`:
    - tracking kernel signals and return codes of the child process (`proof-producer`);
    - mapping error returned from `handleImpl(...)` to `TaskExecErr` type;

* `TaskScheduler`: validating result against task entry before `OnTaskTerminated` call;
* `TaskStorage`: publishing metrics only after successful tx commit in `ProcessTaskResult` and `RescheduleHangingTasks` methods;

[refactoring]
* Declared `commonStorage` type reused by `TaskStorage`, `BlockStorage` and `TaskResultStorage`;
* Declared `api.TracerRpcClient` interface with a narrowed method set;
* Handling `rpc.ErrRpcCallFailed` in prover's task handler;
* Added `TaskErrRpc` value (retryable error);
@zadykian zadykian force-pushed the sync-committee/tasks-rescheduling branch from 9d06be3 to 465d480 Compare February 4, 2025 12:16
@zadykian zadykian requested a review from oclaw February 4, 2025 13:18
khannanov-nil added a commit that referenced this pull request Feb 4, 2025
fixes #1

fixes #2

fixes #3

fixes #4

fixes #5

fixes #6

fixes #7

fixes #8
@zadykian zadykian added this pull request to the merge queue Feb 4, 2025
Merged via the queue into main with commit a787508 Feb 4, 2025
12 checks passed
@zadykian zadykian deleted the sync-committee/tasks-rescheduling branch February 4, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants