-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync Committee: Task Errors & Rescheduling #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
82ccb92
to
3f406e3
Compare
3f406e3
to
f8b8779
Compare
f8b8779
to
ae911b8
Compare
b59ac46
to
cbdb324
Compare
ae911b8
to
5c0cbde
Compare
0d40fc4
to
4f95650
Compare
akokoshn
reviewed
Jan 31, 2025
nil/services/synccommittee/prover/internal/constants/proof_producer_codes.go
Outdated
Show resolved
Hide resolved
285c6b3
to
29778d6
Compare
akokoshn
approved these changes
Feb 3, 2025
29778d6
to
9d06be3
Compare
oclaw
reviewed
Feb 3, 2025
nil/services/synccommittee/internal/storage/task_storage_test.go
Outdated
Show resolved
Hide resolved
nil/services/synccommittee/proofprovider/task_state_change_handler.go
Outdated
Show resolved
Hide resolved
nil/services/synccommittee/prover/internal/constants/proof_producer_codes.go
Show resolved
Hide resolved
makxenov
approved these changes
Feb 4, 2025
* Declared `TaskExecError` with `TaskErrType` enum, integrated it into `TaskResult`; * `TaskStorage.ProcessTaskResult`: resetting task for later re-execution in case of non-critical error; * `TaskStateChangeHandler (proof_provider)`: do not propagate non-critical error to the parent task; * `TaskHandler (prover)`: - tracking kernel signals and return codes of the child process (`proof-producer`); - mapping error returned from `handleImpl(...)` to `TaskExecErr` type; * `TaskScheduler`: validating result against task entry before `OnTaskTerminated` call; * `TaskStorage`: publishing metrics only after successful tx commit in `ProcessTaskResult` and `RescheduleHangingTasks` methods; [refactoring] * Declared `commonStorage` type reused by `TaskStorage`, `BlockStorage` and `TaskResultStorage`;
* Declared `api.TracerRpcClient` interface with a narrowed method set; * Handling `rpc.ErrRpcCallFailed` in prover's task handler; * Added `TaskErrRpc` value (retryable error);
9d06be3
to
465d480
Compare
khannanov-nil
added a commit
that referenced
this pull request
Feb 4, 2025
oclaw
approved these changes
Feb 4, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sync Committee: Task Errors & Rescheduling
This PR provides several changes of task error handling and propagation
Instead of plain error text,
TaskExecError
is returned from the executor (prover / proof_provider) as a part ofTaskResult
. Thus, it is now possible to distinguish critical errors from non-critical ones and handle them differently.TaskStorage.ProcessTaskResult(...)
method is revorked to support new error system. In case of a non-critical error, task is reset with an increasedRetryCount
.ProofProvider now doesn't propagate non-critical error to the parent task (check TaskStateChangeHandler);
Prover's
TaskHandler
:proof-producer
);handleImpl(...)
toTaskExecErr
type;Other changes:
TaskScheduler
now validates received result against task entry beforeOnTaskTerminated
call;Metrics in
TaskStorage
are being published only after successful tx commit inProcessTaskResult
andRescheduleHangingTasks
methods;Declared
commonStorage
type reused byTaskStorage
,BlockStorage
andTaskResultStorage
;client.Client
interface is replaced withapi.TracerRpcClient
which has a narrowed method set (only those that are actually used by tracer);