Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The HandleAERequest method incorrectly sets the term value in the AEResponse message. #55

Open
liang636600 opened this issue Feb 11, 2025 · 0 comments

Comments

@liang636600
Copy link

liang636600 commented Feb 11, 2025

Hi, @wenweihu86 @loveheaven @guohao @wangwg1 , I have discovered that in certain situations, when a follower node processes an AppendEntriesRequest message (abbreviated as AEReq) and generates an AppendEntriesResponse message (abbreviated as AERes), it incorrectly sets the term value in the AERes message. The specific processing logic corresponds to the source code function com.github.wenweihu86.raft.service.impl.RaftConsensusServiceImpl#appendEntries. In the following sections, I will provide a detailed explanation of my findings.

How to trigger this bug

As shown in Figure 1, triggering this bug requires a three-node cluster (n1, n2, n3), where n3 is not explicitly represented in the diagram. During the election phase, n1 first times out, causing its term to increase to 1. It then sends a vote request to n3. Upon receiving the request, n3 grants its vote to n1. Finally, after receiving the vote from n3, n1 becomes the leader. Notably, during this phase, n2 neither receives any vote requests from n1 nor experiences a timeout, so its term remains 0.

Subsequently, n1 receives a client request and appends the value 5 to its log (as illustrated by Action 1 in Figure 1). Then, through Action 2, n1 sends an AppendEntries request (abbreviated as AEReq) to n2, with the message contents detailed in the right-side table under the row corresponding to Action 2. Finally, in Action 3, n2 processes the AEReq message via HandleAEReq and generates an AppendEntriesResponse message (abbreviated as AERes), with the message contents shown in the row for Action 3 in the table.

However, the term value in the AERes message generated by n2 is incorrect(As shown in the red-highlighted section of Figure 1). While processing the AEReq message, n2 updates its term to 1, yet it still uses the old term value of 0 in the final AERes message. This behavior contradicts the Raft paper’s description of the term value in AEResponse messages (as depicted in Figure 2).

Figure 1. Incorrect Term Value in AERes Message Generated by Follower Node.

Figure 2. Term Value Specification for AERes Messages in the Raft Paper.

Suggested fix

The root cause of this bug is that after the follower node updates its own term value, it fails to promptly update the term value in the AERes message, resulting in an outdated term being sent.

Once the root cause is identified, fixing this bug is straightforward. The solution simply requires adding a line of code to update the term value in the AERes message immediately after the node updates its own term using the raftNode.stepDown(request.getTerm()); method. Specifically, adding responseBuilder.setTerm(raftNode.getCurrentTerm()); ensures that the AERes message reflects the correct term value.

Thank you for taking the time to read this. I'm looking forward to your confirmation, and would be happy to help fix the issue if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant