You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @wenweihu86@loveheaven@guohao@wangwg1 , I have discovered that in certain situations, when a follower node processes an AppendEntriesRequest message (abbreviated as AEReq) and generates an AppendEntriesResponse message (abbreviated as AERes), it incorrectly sets the term value in the AERes message. The specific processing logic corresponds to the source code function com.github.wenweihu86.raft.service.impl.RaftConsensusServiceImpl#appendEntries. In the following sections, I will provide a detailed explanation of my findings.
How to trigger this bug
As shown in Figure 1, triggering this bug requires a three-node cluster (n1, n2, n3), where n3 is not explicitly represented in the diagram. During the election phase, n1 first times out, causing its term to increase to 1. It then sends a vote request to n3. Upon receiving the request, n3 grants its vote to n1. Finally, after receiving the vote from n3, n1 becomes the leader. Notably, during this phase, n2 neither receives any vote requests from n1 nor experiences a timeout, so its term remains 0.
Subsequently, n1 receives a client request and appends the value 5 to its log (as illustrated by Action 1 in Figure 1). Then, through Action 2, n1 sends an AppendEntries request (abbreviated as AEReq) to n2, with the message contents detailed in the right-side table under the row corresponding to Action 2. Finally, in Action 3, n2 processes the AEReq message via HandleAEReq and generates an AppendEntriesResponse message (abbreviated as AERes), with the message contents shown in the row for Action 3 in the table.
However, the term value in the AERes message generated by n2 is incorrect(As shown in the red-highlighted section of Figure 1). While processing the AEReq message, n2 updates its term to 1, yet it still uses the old term value of 0 in the final AERes message. This behavior contradicts the Raft paper’s description of the term value in AEResponse messages (as depicted in Figure 2).
Figure 1. Incorrect Term Value in AERes Message Generated by Follower Node.
Figure 2. Term Value Specification for AERes Messages in the Raft Paper.
Suggested fix
The root cause of this bug is that after the follower node updates its own term value, it fails to promptly update the term value in the AERes message, resulting in an outdated term being sent.
Once the root cause is identified, fixing this bug is straightforward. The solution simply requires adding a line of code to update the term value in the AERes message immediately after the node updates its own term using the raftNode.stepDown(request.getTerm()); method. Specifically, adding responseBuilder.setTerm(raftNode.getCurrentTerm()); ensures that the AERes message reflects the correct term value.
Thank you for taking the time to read this. I'm looking forward to your confirmation, and would be happy to help fix the issue if needed.
The text was updated successfully, but these errors were encountered:
Hi, @wenweihu86 @loveheaven @guohao @wangwg1 , I have discovered that in certain situations, when a follower node processes an AppendEntriesRequest message (abbreviated as AEReq) and generates an AppendEntriesResponse message (abbreviated as AERes), it incorrectly sets the term value in the AERes message. The specific processing logic corresponds to the source code function com.github.wenweihu86.raft.service.impl.RaftConsensusServiceImpl#appendEntries. In the following sections, I will provide a detailed explanation of my findings.
How to trigger this bug
As shown in Figure 1, triggering this bug requires a three-node cluster (n1, n2, n3), where n3 is not explicitly represented in the diagram. During the election phase, n1 first times out, causing its
term
to increase to 1. It then sends a vote request to n3. Upon receiving the request, n3 grants its vote to n1. Finally, after receiving the vote from n3, n1 becomes the leader. Notably, during this phase, n2 neither receives any vote requests from n1 nor experiences a timeout, so its term remains 0.Subsequently, n1 receives a client request and appends the value 5 to its log (as illustrated by Action 1 in Figure 1). Then, through Action 2, n1 sends an
AppendEntries
request (abbreviated asAEReq
) to n2, with the message contents detailed in the right-side table under the row corresponding to Action 2. Finally, in Action 3, n2 processes theAEReq
message viaHandleAEReq
and generates anAppendEntriesResponse
message (abbreviated asAERes
), with the message contents shown in the row for Action 3 in the table.However, the
term
value in theAERes
message generated by n2 is incorrect(As shown in the red-highlighted section of Figure 1). While processing theAEReq
message, n2 updates itsterm
to 1, yet it still uses the old term value of 0 in the finalAERes
message. This behavior contradicts the Raft paper’s description of theterm
value inAEResponse
messages (as depicted in Figure 2).Figure 1. Incorrect Term Value in AERes Message Generated by Follower Node.
Figure 2. Term Value Specification for AERes Messages in the Raft Paper.
Suggested fix
The root cause of this bug is that after the follower node updates its own term value, it fails to promptly update the term value in the AERes message, resulting in an outdated term being sent.
Once the root cause is identified, fixing this bug is straightforward. The solution simply requires adding a line of code to update the
term
value in theAERes
message immediately after the node updates its ownterm
using theraftNode.stepDown(request.getTerm());
method. Specifically, addingresponseBuilder.setTerm(raftNode.getCurrentTerm());
ensures that theAERes
message reflects the correctterm
value.Thank you for taking the time to read this. I'm looking forward to your confirmation, and would be happy to help fix the issue if needed.
The text was updated successfully, but these errors were encountered: