If I try to do this like so:
subject_facet = custom,
getAllSubfields(600[a-z]:610[a-z]:611[a-z]:630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z],
" � ")
(That's a UTF-8 em-dash up there) Then my em-dash ends up corrupted. I
wonder if that's because SolrMarc is interpreting anything in the index
properties as marc8, if "marc.default_encoding = MARC8" or something? Or
maybe it's just messing it up otherwise?
Is there any reasonable way to do this?
Jonathan
http://en.wikipedia.org/wiki/.properties
On Mar 23, 2010, at 8:14 PM, Jonathan Rochkind wrote:
> So let's say I want to include a non-ascii char in my solrmarc
> config.Say, with my subject facet, I want to actually use an em-
> dash, instead of two hyphens.
>
> If I try to do this like so:
>
> subject_facet = custom, getAllSubfields(600[a-z]:610[a-z]:611[a-z]:
> 630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z], " — ")
>
>
> (That's a UTF-8 em-dash up there) Then my em-dash ends up corrupted.
> I wonder if that's because SolrMarc is interpreting anything in the
> index properties as marc8, if "marc.default_encoding = MARC8" or
> something? Or maybe it's just messing it up otherwise?
>
> Is there any reasonable way to do this?
>
> Jonathan
>
> --
> You received this message because you are subscribed to the Google
> Groups "solrmarc-tech" group.
> To post to this group, send email to solrma...@googlegroups.com.
> To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en
> .
>
Erik Hatcher wrote:
> .properties files must be ISO-8859-1 encoded. So you'll need to use
> unicode escape sequence for any other characters. \uHHHH syntax.
>
> http://en.wikipedia.org/wiki/.properties
>
>
> On Mar 23, 2010, at 8:14 PM, Jonathan Rochkind wrote:
>
>
>> So let's say I want to include a non-ascii char in my solrmarc
>> config.Say, with my subject facet, I want to actually use an em-
>> dash, instead of two hyphens.
>>
>> If I try to do this like so:
>>
>> subject_facet = custom, getAllSubfields(600[a-z]:610[a-z]:611[a-z]:
>> 630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z], " � ")
I still think I'm going to need a custom function for this eventually
anyway, because just joining EVERY subfield with the seperator character
doesn't actually produce proper LCSH output, only certain subfields are
supposed to be prefixed with the seperator char.
But one issue at a time! The issue of getting a unicode em-dash in
there, solved!
Jonathan
Erik Hatcher wrote:
> .properties files must be ISO-8859-1 encoded. So you'll need to use
> unicode escape sequence for any other characters. \uHHHH syntax.
>
> http://en.wikipedia.org/wiki/.properties
>
>
> On Mar 23, 2010, at 8:14 PM, Jonathan Rochkind wrote:
>
>
>> So let's say I want to include a non-ascii char in my solrmarc
>> config.Say, with my subject facet, I want to actually use an em-
>> dash, instead of two hyphens.
>>
>> If I try to do this like so:
>>
>> subject_facet = custom, getAllSubfields(600[a-z]:610[a-z]:611[a-z]:
>> 630[a-z]:650[a-z]:651[a-z]:655[a-z]:690[a-z], " � ")