Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The incorrect initialization logic in the SegmentedLog class prevents the node from restarting. #56

Open
liang636600 opened this issue Feb 12, 2025 · 0 comments

Comments

@liang636600
Copy link

Hi, @wenweihu86 @loveheaven @guohao @wangwg1 , I discovered that the incorrect initialization logic in the SegmentedLog class prevents the node from restarting. Below, I will explain my findings in detail.

How to trigger this bug

As shown in Figure 1, this is a 3-node cluster (n1, n2, and n3 are not shown in the diagram). First, n1 experiences a timeout (Action 1), followed by n1 sending a vote request to n2 (Action 2). n2 processes n1's vote request and votes for n1 (Action 3), and n1 receives n2's vote to become the leader (Action 4). Then, n1 receives a ClientReq request from the client (Action 5) and writes the value 5 to its log. Finally, n1 restarts (Action 6). However, I found that no matter what, n1 is unable to restart, which led me to identify this bug.

Figure 1. An Example That Triggers the Bug.

Root cause

Next, I will explain why n1 is unable to restart. When n1 restarts, it runs the com.github.wenweihu86.raft.example.server.ServerMain#main method, which calls the com.github.wenweihu86.raft.RaftNode#RaftNode constructor to initialize the RaftNode class. This initialization method then calls com.github.wenweihu86.raft.storage.SegmentedLog#SegmentedLog to initialize the SegmentedLog class. The issue arises during the initialization of the SegmentedLog class. As shown in Figure 2, at this point, n1's data is stored in example1/data/log, which contains only a single open-1 file that holds the log entries (the log entry where the client wrote the value 5), but there is no metadata file. In contrast, the example2/data/log directory of n2, as shown in Figure 2, does contain a metadata file.

Figure 3 shows the SegmentedLog initialization code. For n1, the metadata file is null, and startLogIndexSegmentMap.size() > 0, which leads to the execution of throw new RuntimeException("No readable metadata file but found segments");. However, this RuntimeException is not handled, which ultimately causes n1 to crash. This is the root cause of n1's failure to restart.

Figure 2. Data on Disk During n1 Restart.

Figure 3. SegmentedLog Initialization Code.

The following log is a portion of the logs from n1 during the restart process, which further confirms that it was the unhandled RuntimeException thrown by SegmentedLog that ultimately caused the failure of n1's restart.

2025-02-11 19:19:21.465 [main] WARN  c.g.w.raft.storage.SegmentedLog --- meta file not exist, name=./data/log/metadata
2025-02-11 19:19:21.466 [main] ERROR c.g.w.raft.storage.SegmentedLog --- No readable metadata file but found segments in ./data/log
Exception in thread "main" java.lang.RuntimeException: No readable metadata file but found segments
	at com.github.wenweihu86.raft.storage.SegmentedLog.<init>(SegmentedLog.java:49)
	at com.github.wenweihu86.raft.RaftNode.<init>(RaftNode.java:95)
	at com.github.wenweihu86.raft.example.server.ServerMain.main(ServerMain.java:60)

Suggested fix

After identifying the root cause of the bug, fixing the issue is quite simple. The solution is to comment out the code that throws the RuntimeException during the SegmentedLog initialization, specifically throw new RuntimeException("No readable metadata file but found segments");. This check is flawed, as it incorrectly prevents a normal node like n1 from restarting.

Thank you for taking the time to read this. I'm looking forward to your confirmation, and would be happy to help fix the issue if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant