special characters in solr config

65 views
Skip to first unread message

Jonathan Rochkind

unread,
Mar 23, 2010, 8:14:09 PM3/23/10
to solrma...@googlegroups.com
So let's say I want to include a non-ascii char in my solrmarc
config.Say, with my subject facet, I want to actually use an em-dash,
instead of two hyphens.

If I try to do this like so:

subject_facet = custom,
getAllSubfields(600[a-z]:610[a-z]:611[a-z]:630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z],
" � ")


(That's a UTF-8 em-dash up there) Then my em-dash ends up corrupted. I
wonder if that's because SolrMarc is interpreting anything in the index
properties as marc8, if "marc.default_encoding = MARC8" or something? Or
maybe it's just messing it up otherwise?

Is there any reasonable way to do this?

Jonathan

Erik Hatcher

unread,
Mar 23, 2010, 9:35:13 PM3/23/10
to solrma...@googlegroups.com
.properties files must be ISO-8859-1 encoded. So you'll need to use
unicode escape sequence for any other characters. \uHHHH syntax.

http://en.wikipedia.org/wiki/.properties


On Mar 23, 2010, at 8:14 PM, Jonathan Rochkind wrote:

> So let's say I want to include a non-ascii char in my solrmarc
> config.Say, with my subject facet, I want to actually use an em-
> dash, instead of two hyphens.
>
> If I try to do this like so:
>
> subject_facet = custom, getAllSubfields(600[a-z]:610[a-z]:611[a-z]:

> 630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z], " — ")


>
>
> (That's a UTF-8 em-dash up there) Then my em-dash ends up corrupted.
> I wonder if that's because SolrMarc is interpreting anything in the
> index properties as marc8, if "marc.default_encoding = MARC8" or
> something? Or maybe it's just messing it up otherwise?
>
> Is there any reasonable way to do this?
>
> Jonathan
>

> --
> You received this message because you are subscribed to the Google
> Groups "solrmarc-tech" group.
> To post to this group, send email to solrma...@googlegroups.com.
> To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en
> .
>

Jonathan Rochkind

unread,
Mar 24, 2010, 10:23:08 AM3/24/10
to solrma...@googlegroups.com
Aha, makes sense! Thanks Erik. I forget everything isn't UTF-8. I'll try
that \u escape syntax, that will work fine if it works.

Erik Hatcher wrote:
> .properties files must be ISO-8859-1 encoded. So you'll need to use
> unicode escape sequence for any other characters. \uHHHH syntax.
>
> http://en.wikipedia.org/wiki/.properties
>
>
> On Mar 23, 2010, at 8:14 PM, Jonathan Rochkind wrote:
>
>
>> So let's say I want to include a non-ascii char in my solrmarc
>> config.Say, with my subject facet, I want to actually use an em-
>> dash, instead of two hyphens.
>>
>> If I try to do this like so:
>>
>> subject_facet = custom, getAllSubfields(600[a-z]:610[a-z]:611[a-z]:

>> 630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z], " � ")

Jonathan Rochkind

unread,
Mar 24, 2010, 11:34:41 AM3/24/10
to solrma...@googlegroups.com
Ooh, and it does indeed work. joining subfields with "\u2014" gets me an
em-dash in my facet value, hooray.

I still think I'm going to need a custom function for this eventually
anyway, because just joining EVERY subfield with the seperator character
doesn't actually produce proper LCSH output, only certain subfields are
supposed to be prefixed with the seperator char.

But one issue at a time! The issue of getting a unicode em-dash in
there, solved!

Jonathan

Erik Hatcher wrote:
> .properties files must be ISO-8859-1 encoded. So you'll need to use
> unicode escape sequence for any other characters. \uHHHH syntax.
>
> http://en.wikipedia.org/wiki/.properties
>
>
> On Mar 23, 2010, at 8:14 PM, Jonathan Rochkind wrote:
>
>
>> So let's say I want to include a non-ascii char in my solrmarc
>> config.Say, with my subject facet, I want to actually use an em-
>> dash, instead of two hyphens.
>>
>> If I try to do this like so:
>>
>> subject_facet = custom, getAllSubfields(600[a-z]:610[a-z]:611[a-z]:

>> 630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z], " � ")

Reply all
Reply to author
Forward
0 new messages