You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the storage layer/controller itself becomes unstable and causing the disruption during the segment copy process, the old segment will be irreversibly overwritten in the deep store. Due to the copy process from disk to disk is usually not atomic, such failures would almost guarantee corrupted/zero-sized files left in the deep store. We have seen multiple occurences of this in our deployment.
The proposed fix is to make the process safer and at least recoverable, this involves a few process:
calculate CRC of the original file in temp folder
rename old segment in deep store (iff the old segment exists)
copy the segment over from temp folder to dst
verify the CRC of the same file in deep store
delete the renamed old segment (iff the old segment exists)
in the worst case if the corruption does happen we would have the old segment still in deep store to recover
The text was updated successfully, but these errors were encountered:
jasperjiaguo
changed the title
copyFromSegmentFileToDeepStore can create corrupted segements during
copyFromSegmentFileToDeepStore can create corrupted segements during storage layer/controller disruption
Jan 22, 2025
copyFromSegmentFileToDeepStore right now depends very much on the stability of underlying storage layer implementation:
When the storage layer/controller itself becomes unstable and causing the disruption during the segment copy process, the old segment will be irreversibly overwritten in the deep store. Due to the copy process from disk to disk is usually not atomic, such failures would almost guarantee corrupted/zero-sized files left in the deep store. We have seen multiple occurences of this in our deployment.
The proposed fix is to make the process safer and at least recoverable, this involves a few process:
in the worst case if the corruption does happen we would have the old segment still in deep store to recover
The text was updated successfully, but these errors were encountered: