You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a large index, size over 800Gb.
There are billions redords in this index, and most records(over 95%) are duplicate records, they are generated by log resend.
ES limit search Batch size less than 10000, and this index is so huge. I tried it with es-dedupe to dedupe it records just during 1 mins, search processing cost 1 hours(I checked the es server, 4G IO per second).
Maybe there is another way to deal with it.
Read origin index, if the record is unique, write it to a new index. If record is duplicated, skip. If the new index is still huge, limit the new index size, over size then write to another new index named as xxx-001
The text was updated successfully, but these errors were encountered:
There is a large index, size over 800Gb.
There are billions redords in this index, and most records(over 95%) are duplicate records, they are generated by log resend.
ES limit search Batch size less than 10000, and this index is so huge. I tried it with es-dedupe to dedupe it records just during 1 mins, search processing cost 1 hours(I checked the es server, 4G IO per second).
Maybe there is another way to deal with it.
Read origin index, if the record is unique, write it to a new index. If record is duplicated, skip. If the new index is still huge, limit the new index size, over size then write to another new index named as xxx-001
The text was updated successfully, but these errors were encountered: