UnicodeNormalizeFilter.jar

10 views
Skip to first unread message

Edwin Shin

unread,
Apr 27, 2011, 7:01:11 AM4/27/11
to blacklight-...@googlegroups.com
Is the use of UnicodeNormalizeFilter.jar (and normalizer.jar) documented anywhere? I was playing with a vanilla BL 2.8.0 install and departed from my normal practice of using blacklight-jetty or hydra-jetty, instead opting to use a vanilla solr 1.4.1 and it took me a good half-hour to realize what was going on.

If it's documented anywhere that for the demo app, those two jars ought to be made available to Solr, I didn't see it.

Chris Beer

unread,
Apr 27, 2011, 7:54:55 AM4/27/11
to blacklight-...@googlegroups.com
Hi Eddie,

I don't know anything about the UnicodeNormalizeFilter, but have you considered using Solr 3.1 and the ICUFoldingFilterFactory (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory), which I believe performs a similar task.

Chris

On Apr 27, 2011, at 7:01 AM, Edwin Shin wrote:

Is the use of UnicodeNormalizeFilter.jar (and normalizer.jar) documented anywhere? I was playing with a vanilla BL 2.8.0 install and departed from my normal practice of using blacklight-jetty or hydra-jetty, instead opting to use a vanilla solr 1.4.1 and it took me a good half-hour to realize what was going on.

If it's documented anywhere that for the demo app, those two jars ought to be made available to Solr, I didn't see it.

--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-...@googlegroups.com.
To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.


Edwin Shin

unread,
Apr 27, 2011, 9:33:04 AM4/27/11
to blacklight-...@googlegroups.com
I haven't played with Solr 3.1 yet. No reports of any issues running Blacklight against it, I take it?

Jonathan Rochkind

unread,
Apr 27, 2011, 9:39:55 AM4/27/11
to blacklight-...@googlegroups.com
No reports of anyone running Blacklight with Solr 3.1 at all, yet! If you had time to investigate it and give us a report, that'd be awesome of you.

There shouldn't be any major problems, and I don't expect it will require any code changes in Blacklight. (knock on wood). You'll just have to do a bit of work to set it up yourself using the solrconfig.xml/schema.xml BL expects (use the 1.4.1 version as a model), or to run BL automated tests against it to confirm it's all good, since BL won't do it for you (yet).
________________________________________
From: blacklight-...@googlegroups.com [blacklight-...@googlegroups.com] on behalf of Edwin Shin [edwin...@yourmediashelf.com]
Sent: Wednesday, April 27, 2011 9:33 AM
To: blacklight-...@googlegroups.com
Subject: Re: [Blacklight-development] UnicodeNormalizeFilter.jar

I haven't played with Solr 3.1 yet. No reports of any issues running Blacklight against it, I take it?

t>

Chris Beer

unread,
Apr 27, 2011, 9:57:22 AM4/27/11
to blacklight-...@googlegroups.com
I sent an email about using Solr 4 (which is substantially similar to 3.1,
as far as I know) to the list last month and have pushed an experimental
blacklight-jetty branch running 4.0 at
https://github.com/projectblacklight/blacklight-jetty/tree/solr-4. After
making the solr configuration changes, all of the tests pass unmodified,
except the spellchecking tests (presumably because of some changes in the
solr component(?)).

If you need SolrMarc support, there is a little wrinkle to build a version
of SolrMarc with the right SolrJ dependency. If you need help figuring that
piece out, I can try to remember the steps.


------ Forwarded Message
From: Chris Beer <chris...@wgbh.org>
Date: Mon, 28 Mar 2011 14:28:46 -0400
To: <blacklight-...@googlegroups.com>
Conversation: [Blacklight-development] Re: upgrading to solr 4.0
Subject: Re: [Blacklight-development] Re: upgrading to solr 4.0

Apologies for resurrecting a dead thread, but I¹ve started a solr 4
blacklight-jetty branch (mainly to run the tests against).

Branch: https://github.com/projectblacklight/blacklight-jetty/tree/solr-4
Diff:
https://github.com/projectblacklight/blacklight-jetty/compare/master...solr-
4

The majority of the Blacklight tests passed using stock Blacklight, however
there were some spellchecking failures (which might actually be bad tests <
the intended behavior isn¹t entirely clear to me)

To run the Blacklight tests, I had to self-compile SolrMarc using the latest
SolrJ library. I can¹t remember off-hand if the embedded solr feature
worked, or if I had to run against the http endpoint.

I tried to leave the out-of-the-box solr config alone as much as possible
and just add in the appropriate Blacklight configuration. I wasn¹t entirely
successful in this attempt, but should do better next time (and, note to
self, also commit the stock Solr configs for ease-of-diffing later)

https://github.com/projectblacklight/blacklight-jetty/blob/solr-4/solr/devel
opment-core/conf/solrconfig.xml
https://github.com/projectblacklight/blacklight-jetty/blob/solr-4/solr/devel
opment-core/conf/schema.xml

The two major differences are:

1. using the solr multicore configuration with a development core and a test
core. I certainly like the multicore configuration better, and if there are
no objections would like to proceed with it.
2. Using the new (in Solr 3.1) ICU tokenizers and filters to replace the
schema.UnicodeNormalizationFilterFactory, schema.CJKFilterFactory, etc. See
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenize
rFactory and
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingF
ilterFactory

As the tests pass, I assume nothing terrible happens when I do that. I¹d
love for someone who actually knows what they are doing with CJK languages
or unicode normalization to take a look sometime.


On a related note < hopefully as part of this work, I¹ll be able to cobble
together some HOWTO documentation about going from a stock Solr config to
what Blacklight expects and add it to the wiki.


Chris

Naomi Dushay

unread,
Apr 27, 2011, 5:04:29 PM4/27/11
to blacklight-...@googlegroups.com
Eddie,

Those are jars from Bob Haschart that predate the ICUFilter stuff
provided in more recent Solr versions. They are generally associated
with SolrMarc, which has its own list: solrma...@googlegroups.com

You're right that if our demo app requires them in Solr, it should
be documented.

- Naomi

Reply all
Reply to author
Forward
0 new messages