Add a new analyzer that would also use cyrillic and latin alphabet

46 views
Skip to first unread message

Mladen

unread,
Sep 8, 2024, 7:13:00 PM9/8/24
to DSpace Technical Support
Hello everyone!

I didn't want to open a new github issue since it does not resonate that it should be a new issue more like general question and maybe some guidance.

I'm new to the platform and I see that there is much going on with the code itself, so what would be the steps to add new analyzer to the project? (Eg. Serbian or any other Slavic language that supports two writing systems). I've done something similar to demo project while learning Elastic Search by using it, but since this is a whole new platform rather than just Lucene I'm kinda overwhelmed with. :)

I got some sense that `schema.xml` will need changes and maybe ` solrconfig.xml`? Does anyone can explain a bit how it would work? I don't want to break anything, I'm kinda new to this whole field, so I'm currently investigating how can I learn stuff by doing it, rather than be in endless loop of documentation or tutorials..

Any help would be much appretiated!

Cheers!
Mladen

DSpace Technical Support

unread,
Sep 9, 2024, 1:15:21 PM9/9/24
to DSpace Technical Support
Hi Mladen,

We do not have a guide for this task, but there are some older mailing list threads that describe what the process might look like:


In general, DSpace uses Solr for all search/browse.  So you should be able to follow Apache Solr instructions to add a new analyzer.

Tim

Mladen

unread,
Sep 10, 2024, 6:18:12 AM9/10/24
to DSpace Technical Support

Hi Tim,

Thanks a lot for the references, after I check and try everything I'll write it down here so we can maybe add/update this info :)

Also one question if we add another analyzer then if would use this that one right? Or can we configure it or is DSpace smart enough to know what analyzer to use (base on the alphabet maybe?) Do you have that info somewhere? After I get all the info I can maybe contribute with that also :)

Kind regards,
Mladen

DSpace Technical Support

unread,
Sep 13, 2024, 12:21:55 PM9/13/24
to DSpace Technical Support
Hi Mladen,

As noted, I've never done this before and don't have a guide for how to get this to work.  I *suspect* however that you'd need to modify the Solr "schem.xml" file(s) within DSpace to "hook up" the Analyzer into Solr.   Those "schema.xml" files are standard Solr Schema files, so you'd want to look at Solr guides to figure out how to do that.

My best guess is you may want to update the "query" analyzer configuration in the "schema.xml" for the DSpace "search" core.   This "search" core is the one used when you perform searching/browsing within DSpace.  The "query" analyzer is what should be used whenever you run a query against that core.
https://github.com/DSpace/DSpace/blob/dspace-8_x/dspace/solr/search/conf/schema.xml#L86

Hopefully that's helpful.  Essentially, I suspect that you'd need a Solr guide here rather than a DSpace guide.  The changes should likely all be at the Solr level of things.

Tim

Mladen

unread,
Sep 16, 2024, 11:42:21 AM9/16/24
to DSpace Technical Support

Hi Tim,

Thanks for all the info! :)
I've researched a bit, and will try to implement this for exercise. And to get more knowledge into DSpace and Solr.

I was wondering do we need another language(s) analyzer(s) except for the English in the main version? If I succeed so we can document it somewhere? Or if it's possible (still don't know this) to see if dynamic analyzers can be implemented.

Mladen

DSpace Technical Support

unread,
Sep 16, 2024, 1:17:49 PM9/16/24
to DSpace Technical Support
Hi Mladen,

If you (or anyone else) builds some instructions for configuring additional language analyzers, then we could add those instructions to the DSpace Documentation (maybe as a subpage under the "Discovery" docs).  

To be clear, anyone can ask for privileges to edit the DSpace Documentation (as the docs are "owned" by the community).  You just need to setup a free wiki account by emailing wiki...@lyrasis.org, and then let me know you'd like to contribute to the DSpace Documentation.

Thanks,
Tim
Reply all
Reply to author
Forward
0 new messages