Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Aurora pause abnormally in variable batch update strategy #89

Open
chungjin opened this issue Oct 24, 2019 · 2 comments
Open

Aurora pause abnormally in variable batch update strategy #89

chungjin opened this issue Oct 24, 2019 · 2 comments
Assignees

Comments

@chungjin
Copy link

When the variable batch size set to [5,5], with autopause to true, it should only pause twice, each happens after one batch completes. But in actual, aurora pause when batch updating is still in progress, and it pause 4 times in total.

How to reproduce?
variable batch size [5,5], with auto pause to true, SLA sets to 70% percent.

@ridv ridv self-assigned this Oct 30, 2019
@ridv
Copy link
Contributor

ridv commented Nov 7, 2019

Investing this, there is some bug where shards were already marked as failed are being retried again once a resume is sent.

So for example, let's we have to replace shards [0,1,2] and shard 0 ends up being marked as failed, shard 1 is successful, and we pause before we finish evaluating shard 2.

Our update is now in the following state:
0 - failed
1- success
2- working

When we resume the update, shard 0 is retried, even though it should be skipped due to the fact that it failed.

Working on a fix for this.

@ridv
Copy link
Contributor

ridv commented Jan 3, 2020

Correcting my previous statement, it's not the updater that's retrying shard 0, it's the scheduler itself which is the correct behavior. Even though the shard update failed, the failure is only a signal to the updating mechanism.

The unexpected number of pauses were indeed a bug, thanks for finding it @chungjin, I've landed a fix for this at 3e31bc0

Let's try and reproduce the issue. If we can confirm it as fixed, we can go ahead and close this issues as fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants