How to implement Finnish alphabetical sorting

214 views
Skip to first unread message

Barunes Padhy

unread,
Oct 17, 2023, 11:26:02 AM10/17/23
to DSpace Technical Support
Hi,

Finnish alphabet is same as English alphabet except it has three additional letters å, ä and ö. In Finnish alphabetical order these letters are considered as the three last letters of the alphabet so the last six letters of Finnish alphabet are "xyzåäö". DSpace normalizes å and ä as a and ö as o. This leads to undesired alphabetical sort order in DSpace. How could we implement proper Finnish alphabetical sorting in DSpace? We are currently using DSpace 7.5. In Solr, it would seem that sorting of Finnish characters does not happen, at least in our case. This is imminent when we open any collection, and then attempt to browse by title.

To recreate this issue, on a vanilla installation of dspace, we first create 7 test entries inside of a collection with the following titles. When browsed by title in an ascending form, the items are returned in the following order:

  • AABNER Forum Peer Review System
  • Aadam askeetti : Ihminen paratiisissa Johannes Damaskolaisen mukaan
  • AAI-profiler : fast proteome-wide exploratory analysis reveals taxonomic identity, misclassification and contamination
  • Äänen korkeuteen ja puhekieleen liittyvät taidot: teoriasta kuulovammaisten lasten kuntoutukseen
  • Ääniä opinto- ja uraohjauksen kansallisella kentällä – ohjauksen kehittämistä koskevien argumenttien analyysi
  • Aarne Michaël Tallgren, Estonia, and Tartu in 1920 : The image of a country in correspondence
  • The Aarhus Chamber Campaign on Highly Oxygenated Organic Molecules and Aerosols (ACCHA) : particle formation, organic acids, and dimer esters from alpha-pinene ozonolysis at different temperatures

However, the correct order should be :

  • AABNER Forum Peer Review System
  • AAI-profiler : fast proteome-wide exploratory analysis reveals taxonomic identity, misclassification and contamination
  • Aadam askeetti : Ihminen paratiisissa Johannes Damaskolaisen mukaan
  • Aarne Michaël Tallgren, Estonia, and Tartu in 1920 : The image of a country in correspondence
  • The Aarhus Chamber Campaign on Highly Oxygenated Organic Molecules and Aerosols (ACCHA) : particle formation, organic acids, and dimer esters from alpha-pinene ozonolysis at different temperatures
  • Äänen korkeuteen ja puhekieleen liittyvät taidot: teoriasta kuulovammaisten lasten kuntoutukseen
  • Ääniä opinto- ja uraohjauksen kansallisella kentällä – ohjauksen kehittämistä koskevien argumenttien analyysi

To try and mitigate this issue, we tried to change the schema.xml of the search core by explicitly creating a new fieldtype and specifying ICU Collation libraries:

<fieldType name="titleSortType" class="solr.TextField" sortMissingLast="false" omitNorms="false">

      <analyzer>

          <tokenizer class="solr.KeywordTokenizerFactory"/>

          <filter class="solr.LowerCaseFilterFactory"/>

          <!-- Use ICU Collation for Finnish sorting -->

          <analyzer type="index" class="org.apache.lucene.collation.ICUCollationKeyAnalyzer">

               <param name="locale" value="fi" />

          </analyzer>

          <filter class="solr.TrimFilterFactory" />

      </analyzer>

  </fieldType>

And then specifying this to *_sort filed under fieldtype as:

<dynamicField name="*_sort" type="titleSortType" indexed="true" stored="true" multiValued="false" omitNorms="true"/>
but this did not help.
Thanks and Regards,
Barunes
 

Adán Román Ruiz

unread,
Oct 17, 2023, 5:38:40 PM10/17/23
to dspac...@googlegroups.com

Hi

The ordering of the titles is done by the class org.dspace.sort.OrderFormatTitleMarc21.java
defined in dspace.cfg

plugin.named.org.dspace.sort.OrderFormatDelegate= \
        org.dspace.sort.OrderFormatTitleMarc21=title

A new class that meets your expectations should be implemented and used at configuration file.

Regards

Adán

--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/234c3a23-e9d6-4377-a9cf-6f9ef4278ba0n%40googlegroups.com.

Barunes Padhy

unread,
Nov 1, 2023, 3:13:05 PM11/1/23
to DSpace Technical Support
Hi,

Thank you so much for the reply.
To implement our change, we went into OrderFormatTitleMarc21.java and then imported LocaleOrderingFilter.java into it with Finnish locale set. After that, we modified the filters in OrderFormatTitleMarc21.java to have LowerCaseAndTrim, LocaleOrderingFilter, MARC21InitialArticleWord and StripLeadingNonAlphaNum (In that order). This seemed to have worked for titles. However, when we try to do similar steps for OrderFormatAuthor.java, it does not work. In OrderFormatAuthor.java, we try to set the filters to only have LowerCaseAndTrim and LocaleOrderingFilter (in that order). We also tried to explicitly define this in our dspace.cfg like so:

plugin.named.org.dspace.sort.OrderFormatDelegate= \
        org.dspace.sort.OrderFormatTitleMarc21=title, \
        org.dspace.sort.OrderFormatAuthor=author

However, it has not helped.

Thanks and Regards,
Barunes Padhy

Matti Yrjölä

unread,
May 20, 2024, 6:51:15 AM5/20/24
to DSpace Technical Support
Hi,

Did you find any solution for this problem?
We have the same issue with our DSpace 7.6/7.6.1 installations.

Best regards,
Matti

Barunes Padhy

unread,
May 21, 2024, 8:30:49 AM5/21/24
to DSpace Technical Support
Hi,

We have not actively worked on it yet, but we would like to know if you find a fix for it.

Regards,
Barunes Padhy
Reply all
Reply to author
Forward
0 new messages