Multithreading + Caching + perRecordInit

7 views
Skip to first unread message

mario....@googlemail.com

unread,
Jul 6, 2021, 10:17:11 AM7/6/21
to solrmarc-tech

Hello,


to optimize performance, we're currently experimenting with multithreading using
-Dsolrmarc.indexer.threadcount=4

Before that, we already implemented some caching mechanisms which have worked pretty well in single-threaded mode. We have multiple solr fields for which simlar lookup operations need to be executed. The first operation stores its result in a class member variable. Until now, the member variables have been designed to contain only cache values for the current record, and all of them have been reset in the perRecordInit callback.

When using multithreading, it seems like the threads are processing multiple records at the same time, so this caching mechanism needs to be improved. To avoid out of memory problems, the caches need to be cleaned regularly - but the question is: At which point does as Mixin know that all fields of a record have been processed and that it can throw away all old cache entries regarding a certain control number?

For now, we have implemented an extended version of a ConcurrentHashMap which uses a Deque to implement a history, so it knows which cache entry is the oldest and can be thrown away once the max size of the cache has been reached (cache size is about 100, which should be more than enough for 4 concurrent threads).

Of course it would be more efficient to have some kind of callback similar to the existing "perRecordInit" method that a mixin could implement so that it safely knows when to throw away all cached entries for a given control number (something like "perRecordFinished").

Is there any chance that such a callback could be implemented in a future SolrMarc version? Or can you think of any alternate mechanism how such a concurrent cache could be implemented efficiently?

Best regards,
Mario
Reply all
Reply to author
Forward
0 new messages