Increasing update performance of single fields

44 views
Skip to first unread message

Achim Domma

unread,
Jan 29, 2015, 5:41:04 PM1/29/15
to helio...@googlegroups.com
Hi,

we store "tag like" data in multi value fields and need to update those frequently, which is a performance problem. At the moment we have about 5 million documents and an update might affect 50% of those documents, which is already causing trouble. In the future, we want to do this with more than 20 million documents. As far as I understand, SOLR/HS is rewriting the whole document if we update a single field. Is that true? If yes: This is obviously not efficient in our use case. Is this a limitation of SOLR/HS and would it be realistic to optimize our use case by writing some Java code? Or is it a limitation of deep internals of Lucene, which are hard to change? It is still hard to grasp for me, what's going on which abstraction level.

Any hint regarding such a frequent high volume update scenario would be very appreciated.

kind regards,
Achim

Shawn Heisey

unread,
Feb 10, 2015, 3:08:55 PM2/10/15
to helio...@googlegroups.com
Someone will correct me if I'm wrong, but I am fairly sure that this is a low-level limitation in Lucene that would be difficult or impossible to remove.  If it's even possible, it would likely take large-scale fundamental changes in file formats and internal operations, and might result in a significant performance decrease.

Vadim Kirilchuk

unread,
Feb 10, 2015, 4:17:56 PM2/10/15
to helio...@googlegroups.com
Hi,

I remember that somewhere in Vimeo there was a video where someone was giving a talk about the issue and was proposing a solution. I will try to find it and post here. Also maybe it is possible to somehow use DocValues here.

--
You received this message because you are subscribed to the Google Groups "heliosearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to heliosearch...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vadim Kirilchuk

unread,
Mar 1, 2015, 5:50:33 AM3/1/15
to helio...@googlegroups.com
Hi,

I found it, or at least similar video. 
Current situation and possible solutions/worarounds are explained starting from 10:07.

It is two years old presentation, since that time DocValues were introduced, so I recommend to find if they can be an another possible workaround for you.

Vadim Kirilchuk

unread,
Mar 1, 2015, 6:04:05 AM3/1/15
to helio...@googlegroups.com

Achim Domma

unread,
Mar 25, 2015, 12:21:02 PM3/25/15
to helio...@googlegroups.com
That looks very interesting. Will check it out. Thanks! 
Reply all
Reply to author
Forward
0 new messages