Ideas for promoting ptrack's performance #24

vegebird · 2022-09-01T06:07:41Z

Hello everyone,

I have some ideas to share with you, which may benefit ptrack. I'd like to get your response or comments before coding/PR, or is it in ptrack's development plans ?

Compress ptrack.map file.

ptrack.map.mmap is no longer used since changing to PG shared memory instead of mmap() system call, which saves the time to copy ptrack.map file when PG startup.

I have to set a bigger value for ptrack.map_size when $PGDATA is larger, e.g. 1024MB for 1T, 40960MB for 40T, in order to track the changed blocks and decrease hash collision. In this case, we need do a lot of IO for ptrack.map file when PG restarts or does checkpoint and will occupy lots of time. Furthermore, this may impact switchover/failover of PG cluster due to timeout.

How about to compress ptrack.map in shared memory before writing to physical file when doing checkpoint, and decompress ptrack.map into shared memory when loading physical file in starting up ?

Use multi-threads in ptrack_get_pagemapset() to scan files within $PGDATA concurrently.

If $PGDATA is bigger, looks like single process scanning files sequently is slow. I want to setup multi-threads when first call of ptrack_get_pagemapset(), worker of threads will do data file's scan/hash and build tuple into shared memory queue, which will be obtained by each call of ptrack_get_pagemapset() with proper mutex lock and condition variable.

Above are my thoughts, looking forward your comments.

Thanks,
vegebird

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for promoting ptrack's performance #24

Ideas for promoting ptrack's performance #24

vegebird commented Sep 1, 2022

Ideas for promoting ptrack's performance #24

Ideas for promoting ptrack's performance #24

Comments

vegebird commented Sep 1, 2022

Compress ptrack.map file.

Use multi-threads in ptrack_get_pagemapset() to scan files within $PGDATA concurrently.