Finnish alphabet is same as English alphabet except it has three additional letters å, ä and ö. In Finnish alphabetical order these letters are considered as the three last letters of the alphabet so the last six letters of Finnish alphabet are "xyzåäö". DSpace normalizes å and ä as a and ö as o. This leads to undesired alphabetical sort order in DSpace. How could we implement proper Finnish alphabetical sorting in DSpace? We are currently using DSpace 7.5. In Solr, it would seem that sorting of Finnish characters does not happen, at least in our case. This is imminent when we open any collection, and then attempt to browse by title.
To recreate this issue, on a vanilla installation of dspace, we first create 7 test entries inside of a collection with the following titles. When browsed by title in an ascending form, the items are returned in the following order:
However, the correct order should be :
To try and mitigate this issue, we tried to change the schema.xml of the search core by explicitly creating a new fieldtype and specifying ICU Collation libraries:
<fieldType name="titleSortType" class="solr.TextField" sortMissingLast="false" omitNorms="false">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- Use ICU Collation for Finnish sorting -->
<analyzer type="index" class="org.apache.lucene.collation.ICUCollationKeyAnalyzer">
<param name="locale" value="fi" />
</analyzer>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
And then specifying this to *_sort filed under fieldtype as:
<dynamicField name="*_sort" type="titleSortType" indexed="true" stored="true" multiValued="false" omitNorms="true"/>
Hi
The ordering of the titles is done by the class
org.dspace.sort.OrderFormatTitleMarc21.java
defined in dspace.cfg
plugin.named.org.dspace.sort.OrderFormatDelegate= \
org.dspace.sort.OrderFormatTitleMarc21=title
A new class that meets your expectations should be implemented and used at configuration file.
Regards
Adán
--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/234c3a23-e9d6-4377-a9cf-6f9ef4278ba0n%40googlegroups.com.