Facet filter matching fails for diacritics (â, special chars) in sidebar

8 views

Skip to first unread message

Muhep Atasoy

unread,

2:50 AM (11 hours ago) 2:50 AM

to DSpace Technical Support

Hi all,

I'm running into a problem with sidebar facet filtering (browse-by-value / filter list) in a DSpace 8.x (Solr-backed) installation. The UI displays facet values correctly (with diacritics) but searching in the facet input (the small search box in the sidebar) behaves like a strict exact-match: when I type an ASCII version or remove diacritics, items that only differ by diacritics do not match. Example:

Stored/display value: Hamzazâde Esad
If the user types hamzazade in the facet search box, it does not return the expected facet value or matching results.

What I found and tried

The dynamic field for sidebar facets is *_filter:

Current keywordFilter fieldType (originally):

This keeps the stored/display value exact (good), but facet search behaves like exact-match because the index token is the full original string.

I attempted to add Turkish/ICU folding to the analyzer. When I add folding to index analyzer, displayed facet strings started appearing lowercased and diacritics lost (bad for presentation). So I tried to split behaviors with index vs query analyzer:

This left displayed values untouched and query normalization runs but still the problematic diacritic letter â (and some other special characters) does not reliably match when the user types ASCII a. I tested analyzer outputs in Solr Admin Analysis; query tokens become kamil from kâmil, but facets still don't return as expected.

Important constraints:

DSpace code (the UI / server) expects author_filter style fields and currently both search and display use the same *_filter field for facets — I cannot change DSpace to use a separate *_search field easily.
I cannot use WhitespaceTokenizer on query time because index uses KeywordTokenizer (index tokens are whole values) and mismatch causes no hits.

Questions / requests for help

Does the DSpace sidebar facet search use the same keywordFilter fieldType defined above, or does DSpace apply additional query-time processing before facet matching? (Which fieldType or query param does the facet small-search box use?)
Am I missing a Solr parameter that controls how facet search (the small value search) is executed so I can inject folding/normalization? (e.g. use of facet.contains, facet.prefix, facet.method, facet.contains.ignoreCase or special params?)
Has anyone solved diacritics matching in the sidebar facets without losing the displayed original strings? Best-practice patterns: use copyField, multi-field approach, mapping char filter, or client-side solution?
If multi-field (search+display) is the recommended approach but DSpace insists on *_filter, is there a recommended DSpace config or XSL/template hook to let facet UI show stored display value while facet search works against a different indexed token?

What I can provide if helpful:

sample schema.xml snippets
Solr analysis outputs (index vs query) for sample values like kâmil, kâmil\n|||\nKâmil etc.
steps I used to test in Solr Admin (analysis page) and example queries that fail.

Thanks in advance any pointers, config snippets, or DSpace-specific guidance would be appreciated.