Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copyFromSegmentFileToDeepStore can create corrupted segements during storage layer/controller disruption #14869

Open
jasperjiaguo opened this issue Jan 22, 2025 · 1 comment
Assignees
Labels

Comments

@jasperjiaguo
Copy link
Contributor

jasperjiaguo commented Jan 22, 2025

copyFromSegmentFileToDeepStore right now depends very much on the stability of underlying storage layer implementation:

When the storage layer/controller itself becomes unstable and causing the disruption during the segment copy process, the old segment will be irreversibly overwritten in the deep store. Due to the copy process from disk to disk is usually not atomic, such failures would almost guarantee corrupted/zero-sized files left in the deep store. We have seen multiple occurences of this in our deployment.

The proposed fix is to make the process safer and at least recoverable, this involves a few process:

  • calculate CRC of the original file in temp folder
  • rename old segment in deep store (iff the old segment exists)
  • copy the segment over from temp folder to dst
  • verify the CRC of the same file in deep store
  • delete the renamed old segment (iff the old segment exists)

in the worst case if the corruption does happen we would have the old segment still in deep store to recover

@jasperjiaguo jasperjiaguo changed the title copyFromSegmentFileToDeepStore can create corrupted segements during copyFromSegmentFileToDeepStore can create corrupted segements during storage layer/controller disruption Jan 22, 2025
@jasperjiaguo jasperjiaguo self-assigned this Jan 22, 2025
@mcvsubbu
Copy link
Contributor

For the backup to exist, don't we need to designate a specific area/folder to create the temp files in the first place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants