not working with over 800Gb index #16

LQBing · 2023-08-31T09:46:54Z

There is a large index, size over 800Gb.
There are billions redords in this index, and most records(over 95%) are duplicate records, they are generated by log resend.

ES limit search Batch size less than 10000, and this index is so huge. I tried it with es-dedupe to dedupe it records just during 1 mins, search processing cost 1 hours(I checked the es server, 4G IO per second).

Maybe there is another way to deal with it.
Read origin index, if the record is unique, write it to a new index. If record is duplicated, skip. If the new index is still huge, limit the new index size, over size then write to another new index named as xxx-001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not working with over 800Gb index #16

not working with over 800Gb index #16

LQBing commented Aug 31, 2023

not working with over 800Gb index #16

not working with over 800Gb index #16

Comments

LQBing commented Aug 31, 2023