scorch status

145 views
Skip to first unread message

Amnon BC

unread,
Dec 27, 2020, 9:25:14 AM12/27/20
to bleve
Hi all,
Just watched https://www.youtube.com/watch?v=zjG2Y01i3Kk&ab_channel=GopherConUK

Currently we are using rocksdb as a backend, but scorch looks interresting.
What is the current status of scorch? And how do we use it?

Marty Schoch

unread,
Dec 27, 2020, 9:35:09 AM12/27/20
to bl...@googlegroups.com
Scorch is the recommended index type for all users.  It is not yet the default, because defaults are essentially part of the API, and changing it would be a breaking change.

Instructions for using scorch can be found here: https://github.com/blevesearch/bleve/issues/1350  Search for NewUsing.  Also at this time the latest version of zap is v15.

We are in the process of preparing a proposal for the 2.x release, and scorch will become the default as part of this.  We will also be deprecating the upsidedown index, and all the related k/v store plugins.  None of the core maintainers use upsidedown any longer, and it represents a huge chunk of code that no one is really familiar with at this point.  We have continued to support these indexes as best we can, but that time is running out.  We will continue to support these for a little longer, but the important message is that now is the time to switch.  If you have a use case that isn't well supported by scorch, we need to know about it now.  We already know of a few:

- In-Memory indexes continue to use upsidedown
- Online backup is not possible with scorch today

We plan to address both of these before dropping support of upsidedown.

marty

--
You received this message because you are subscribed to the Google Groups "bleve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bleve+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bleve/ae13a461-efe8-4401-b3c9-6790ff0465den%40googlegroups.com.

Amnon BC

unread,
Dec 28, 2020, 12:07:56 PM12/28/20
to bleve
I took the scorch index for a spin. It generally works great. But a small proportion of my tests deadlock when opening the index.
The code previously used RocksDb, and reopened read indexes periodically to get the latest snapshot of the data.
What is the lifecycle of a scorch index? Is there still a benefit in opening an index with cfg["read_only"] = true when using a read-only connection?
Is having multiple read and write indexes to the same path OK? Or should my process only open an index once and keep it open for the lifetime of the process?
Thanks,
Amnon

Marty Schoch

unread,
Dec 28, 2020, 3:38:11 PM12/28/20
to bl...@googlegroups.com
You should never have to reopen indexes to see data.  And we have never supported more than one reader/writer for a given path.  So I'm not really sure where to begin.  Opening an index read-only, only ensures that it won't be modified, no writers can proceed while those readers are open.

marty

Amnon BC

unread,
Dec 28, 2020, 4:21:10 PM12/28/20
to bl...@googlegroups.com
Great. Thanks for clarifying.
- Amnon

You received this message because you are subscribed to a topic in the Google Groups "bleve" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bleve/wdR_XPeruQ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bleve+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bleve/CAE2t7bpgdjNT%2BX6umK9DC_U5gjavOguFw-%3DEuX8LxCaCcQ5U8A%40mail.gmail.com.

Amnon BC

unread,
Dec 30, 2020, 9:59:11 AM12/30/20
to bleve
Hi Marty,
Unfortunately I have had to revert my code back to using upsidedown and rocksdb.
The reason is that we have a serverless architecture where each API call is potentially in a new process.
Most of these calls are searches, but periodically we get calls which index new data.
Using rocksdb, we search off a snapshot, and our system periodically reopen the index to get 
a fresher version of the snapshot. 

When I tried to migrate to Scorch, I found that when several processes open an index, the first process
succeeded and all other processes block until the first process terminates. So it is back to Rocksdb for now.

But I imagine that this will cause us problems going forward, as you are deprecating the other KV stores.

Going forward: is there any way you can support the scenario of a single writer and multiple reader processes accessing a shared index?
And how do you envisage Bleve working in a serverless setup?

Thanks, and Happy New Year!
- Amnon

Marty Schoch

unread,
Dec 31, 2020, 9:28:38 AM12/31/20
to bl...@googlegroups.com
First, I understand the use case, and I think it's a really important feature.  In fact, it was one of the major features I added to the Bluge project when I forked that off of Bleve earlier this year.

Second, that highlights the second main point, the technical part of the solution is relatively straightforward, and already coded in Bluge.  Basically we can't use BoltDB to store the snapshots, because it's limitations prevent this use case.  Instead we write every snapshot to its own file.  Easier side is recording the snapshots, but removing unneeded snapshots is a bit more work now.  Finally, the writing process (responsible for removing old snapshots) needs to be prevented from removing snapshots still being read by other read-only processes, and this means we need to use OS-provided file-locking features.  As I said, this is all coded, basically working, just not production-level tested in the Bluge project at this time.

Porting these changes to Bleve has 3 main problems:

1.  Unlike Bluge, Bleve requires some level of backwards compatibility.  So at a minimum we have to keep support for the old snapshots with Bolt and introduce this new way on the side.  And this adds to the testing effort as well.
2.  The ONLY people paying me to work on Bleve right do not require this feature, so they're unlikely to ever prioritize it.
3.  The ONLY people paying me to work on Bleve right now would see this as a high-risk change that actively gets in the way of their actual priorities.

This is unfortunate.  I have spent the last 18 months trying to diversify the financial support of Bleve, specifically to avoid these kinds of situations.  They are not healthy for the project as a whole, but this is the situation we're in.

So, then why remove upsidedown at all? It seems to still work for people, and even supports use-cases not covered by scorch!

To me, the problem is best illustrated with the RocksDB adapter.

1.  No one on the team had even tried to build this for at least a year, possibly longer.  In the course of preparing our upcoming 2.0 release, we needed to move the blevex (extensions) project to Go modules, to ensure there would be an easily tagged/usable version of it before introducing any 2.x changes.  Because we don't want to drop support immediately, we want to deprecate first and remove later.  At this point, I discovered that it didn't build.  If you took the latest version of our bleve adapter, the latest version of the RocksDB adapter we use (github.com/tecbot/gorocksdb) and the latest version of RocksDB, it does not build.  There is obviously some combination of older versions that work (you're using it happily), but we don't even know what this is.

2.  Fixing the compilation issues does not appear difficult, but likely breaks support for the older versions that we now know some people are using successfully (you!)  Maybe it's not a big deal if RocksDB compatibility is good, or maybe it's a nightmare causing some to reindex everything, again, we have no idea, we don't use this thing.

3.  Even after fixing the compilation issues, there is now a linking issue as well.  It turns out we don't just use the RocksDB adapter, we also have some of our own cgo code which optimizes a common path.  (There is a cost whenever you cross the c-go boundary, which can be amortized by batching differently, and back in the day we wrote the c code to do this, with a cgo wrapper)  But, a few of the functions we used, no longer exist in RocksDB.  It might just be that they were renamed, not removed, it wasn't immediately clear to me.  I didn't spend much time on it because I have no intention of fixing this.  This type of cgo code, where we're sharing data with raw pointers across the boundary is tricky, and the rules have changed (stricter) with recent Go releases.  My point is that it is likely this code either needs updating or a rewrite anyway, and again, no one here is qualified to do it, let alone motivated to care.

And this, in a nutshell is the problem with upsidedown.  It, plus the adapters, are this huge chunk of code that seems to work OK most of the time, but just under the surface you realize it's not properly maintained or supported.  In that sense, deprecating them is not really a change, it's just us being honest about the current state of affairs.

marty



Amnon BC

unread,
Dec 31, 2020, 10:05:31 AM12/31/20
to bleve
Thanks Marty.  Having fought some Rocksdb compilation issues myself over the last few weeks, (possibly the same ones you encountered),
I would agree that the RocksDB adapter is fragile and painful to maintain (we are stuck on rocksdb-5.18.3), and that ridding the world of CGO is a worthy cause. 

My gut feeling is that I can carry on using bleve/rocksdb for now, and then migrate over to Bluge when it is production-ready.

Wishing you Happy New Year,
- Amnon

Marty Schoch

unread,
Jan 2, 2021, 2:45:38 PM1/2/21
to bl...@googlegroups.com
Regarding current RocksDB support, can you comment on this proposed fix to the compilation issue?


It works locally for me with 5.17.2 (ubuntu package)

marty

Reply all
Reply to author
Forward
0 new messages