Refactor of Elasticsearch Search Service PR?

Greg Logan

unread,

Mar 28, 2024, 1:46:30 PM3/28/24

to d...@opencast.org

Hi all,

During the review of #5597 (Replacing Solr with Elasticsearch), Arne left this[1] review. I've addressed most of his comments, but the part about the index being created with a replica started a deeper dive. This is where the problem starts...

When building off of the original prototype of this change, I moved the indexing bits out of the search service, but did not investigate how *else* to create the index itself - it was working, I didn't see a great deal of need to digging around for other solutions. I assumed, incorrectly, that the *rest* of our indexes are created the same way, with each index (event, series, themes) handling their own creation and modification. What's actually happening is that the index creation and modification has been delegated to ElasticsearchIndex, and this is why Arne identified the replica difference. By default, Elasticsearch creates two replicas, but ElasticsearchIndex sets that to a single replica.

The obvious approach here would be to move the Search index over to use this existing infrastructure. The issue with that is that I'm out of time for this contract, and perhaps more importantly: We need to get this in for 16.0. In the short term I would like to unify the approaches, but I don't know if I can get that done in time to get it merged prior to branch cut. What's in the PR works (barring any other issues identified in the review), and I don't foresee a future refactor of this code requiring a reindex. If I plan on doing this refactor sometime in 16.x's timeframe can we merge this in as is, and address the refactor after?

G

1: https://github.com/opencast/opencast/pull/5597#pullrequestreview-1936275584

Lars Kiesow

unread,

Apr 2, 2024, 7:02:54 PM4/2/24

to d...@opencast.org

Hey,
we also discussed this at today's technical meeting and I vaguely
remember this to be a deliberate choice from me back when I started on
the prototype. But since it has been over two years… it may be good to
verify my memories ;-)

Since then, there also have been a few changes to the other OpenSearch
parts which may or may not change stuff.

I deliberately chose to not use the index service for a few reasons.
- First, no one likes the index service ;-p
- Second, replicating/using the way the index service works for the
search service would have meant a lot more code and a structure which
is less flexible than what we have now.
- Third, and most important, the index service depends on the asset
manager, the scheduler, the series service, the capture admin
service, … None of these are available on the presentation node where
the search service would live. Porting that would cause a big
dependency mess.

In the end, just using the client library to connect to a search index
was much simpler, caused less of a dependency hell and it also meant
that the search service could better act as a self-contained micro
service which could just run on any distribution we want it to.

That being said, we could look into unifying the index creation
process. Maybe something like the opencast-db module. We have the three
elasticsearch-* modules which Katrin extracted from the index service,
but they still contain a lot of code and dependencies unnecessary for
the search service.

Still, maybe it's worth investigating sharing that code a bit more.

Best regards,
Lars

Katrin Ihler

unread,

Apr 8, 2024, 4:39:25 AM4/8/24

to d...@opencast.org

Hi guys,

I was not part of this discussion, so I might be missing something
important, but Greg says, correctly I think, that the creation of the
indexes happens in ElasticsearchIndex, which is != the IndexService (and
I think this was already the case two years ago?). But either way, I
concur with Lars that involving the IndexService is 100% not a good
idea. Let's leave that monster out of it.

Using ElasticSearchIndex could be possible, but the way it's doing that
right now is not great either so... we should probably unify that
somehow at some point, but if it's not happening for 16, I would be fine
with that. But we should at least harmonize the replica setting.

Cheers,

Katrin

--
ELAN e.V.
Karlstr. 23
D-26123 Oldeburg

elan-ev.de

Reply all

Reply to author

Forward