I got some idea.
since the entries produced by indexer is streaming protobuf. Then the key to evaluate write_entries performance should be "reading speed of disk". Because entries files are sequenced protobuf. If you read 20% of the entries, then you already finished 20% of the job.
At first, i modified write_entries, because the length of WriteRequest.Update is frequently just 1 or 2. Which mean write_entries doesn't buffer anything and write small data into levelDB every time.
so i make write_entries to write 2GB data to levelDB in one time. Which incrase the performance remarkably.
However later i found the problem is not from write_entries itself. It's because there are too many deduplication in entries. (i generate 600GB entries from linux kernel, and dedup_stream turns it into 6GB file.)
Using dedup_stream tool can also maximize the HDD reading speed. and it's significantly better than my modifed write_entries tool.
So i think the good approach is to eliminate deduplications from start -> merge the kzips.
However if i merge kzips all in one, i cannot run multiple indexer to ultilize my multi-core machine.
Is there any chance to make indexer to know the same file from different kzips? and prevent it from producing deduplications?