concurrent reads and write from multiple processes

363 views
Skip to first unread message

Ken Egozi

unread,
Apr 22, 2015, 6:54:35 AM4/22/15
to bl...@googlegroups.com
Hi. I wonder if the following setup is feasible with bleve (Sorry for trying to apply my Lucene experience to bleve)
a. the index has very low write/read ratio.
b. the index stored on a network volume
c. multiple readers (i.e. different go *processes* on different machines) load the index for querying only
d. a single writer process (separate from all reader processes/machines) is indexing documents in very low frequency (say once an hour or less)
e. (more of a question) - would the readers need to be alerted of the change to be refreshed (reopen the index?)

?

I could work around 'c' by syncing a local copy to each machine/process, and could work around 'd' by syncing to a new index, then writing to the new one, then reloading the readers.


super thanks, and with huge appreciation for the work poured into this fine library,
Ken.

Marty Schoch

unread,
Apr 22, 2015, 3:49:29 PM4/22/15
to bl...@googlegroups.com
I think the low write/read ratio is a common enough use case we need to come up with something to help address it.  Unfortunately there are a couple of challenges to the approach you outlines.

Since we're relying on a variety of different KV stores underneath, they sometimes have very different properties.  I know that LevelDB is purely single process, also to my knowledge it doesn't offer an option to open read-only.  I'm not sure about boltdb being opened by multiple processes, a quick scan of the readme didn't show anything.

So, with the KV stores that have a hard requirement of single process access, it creates a bit of a dead-end since our index is tied to tightly to the KV store.  

Now, lets say we were using Couchstore as the underlying KV store (its actually a bad choice for lots of reasons, but I'm familiar with it, and I know it can be safely read from multiple processes while another process writes).  In that case, because of its append-only design, readers would automatically see the changes, as they always start by seeking to the end of the file to find the last valid root.  Obviously this is an implementation detail, but I mention it because I think the answer to question E is that it just depends...

I think your workarounds for C and D  will work, but obviously with all the costs associated with copying thing around, reloading (and blowing away all the useful stuff you already have cached at this point)

I'd definitely like to improve what we can offer in this area.  If you can share more details about how you've made this work with Lucene that would be great.

Thanks,
marty



--
You received this message because you are subscribed to the Google Groups "bleve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bleve+un...@googlegroups.com.
To post to this group, send email to bl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bleve/3159172a-dd9b-4c56-9119-f298573ea11b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ken Egozi

unread,
Apr 23, 2015, 5:40:31 PM4/23/15
to bl...@googlegroups.com
With Lucene, when you open an index reader you get a snapshot of the index (the common way is to open a RAMDirectory from the FSDirectory, and all reads happen in-memory from the snapshot). the index itself is append-only, so a writer continue to work. readers need to periodically "reopen" to get a refreshed version of the index. Occasionally you'd want to Optimize the index which breaks the append-only nature, and in that case the writer reads a snapshot, optimizes, and writes to a new location, and then readers start working on that new location, and the old is discarded.

Zhou Sam

unread,
Sep 14, 2022, 12:01:26 PM9/14/22
to bleve
Hi All,

I have the same concern mentioned in this message, which looks like 7 years old. But my issue is in-process.

I'm using the latest version of the bleve library.  And I have multiple readers and one writer of the same index, not in different processes, but in separate go routines.
If the writer is updating the index ( using the index.Index(id, data) method ) while the other readers are reading from it ( such as making search requests ), Do I need to make some kind of locks so that all readers have to wait before the writer finishes updating? 

thanks
Sam

Abhi Dangeti

unread,
Sep 14, 2022, 12:12:43 PM9/14/22
to bleve
Hi,


> I'm using the latest version of the bleve library.  And I have multiple readers and one writer of the same index, not in different processes, but in separate go routines.
> If the writer is updating the index ( using the index.Index(id, data) method ) while the other readers are reading from it ( such as making search requests ), Do I need to make some kind of > locks so that all readers have to wait before the writer finishes updating? 

I'll assume you're using the scorch engine if you're using the latest version of the bleve library. You can verify this by checking the `DefaultIndexType` in the bleve.Config.
With scorch - as long as you have a single writer, the index will be able to support concurrent readers. A reader will acquire a snapshot of the index which is reference counted to allow for the search request. If the writer hasn't committed the content it's working on - this "latest" content will not be available in the reader's snapshot. 
Reply all
Reply to author
Forward
0 new messages