In 2008 I had acquired an older version of
UnicodeNormalizationFilterFactory.jar directly from Robert Haschart
and was using that (source code was dated around 2008-06-30) and I
have continued to use that with 1.4.1. Now moving to Solr 3.1 and
have tried the older version of UnicodeNormalizationFilterFactory.jar
and a newer one I acquired from here along with normalizer.jar:
https://github.com/projectblacklight/blacklight-jetty/tree/master/sol... ...I can start the Solr admin app but when I try to do any query I see
this error:
java.lang.AbstractMethodError:
org.apache.lucene.analysis.TokenStream.incrementToken()Z
at
org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTok enFilter.java:
48)
at
org.apache.solr.analysis.WordDelimiterFilter.incrementToken(WordDelimiterFi lter.java:
338)
at
org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.j ava:
60)
at
org.apache.lucene.analysis.KeywordMarkerFilter.incrementToken(KeywordMarker Filter.java:
73)
at
org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(SnowballF ilter.java:
76)
...
At the same time I see there has been quite a bit of discussion of
UnicodeNormalizationFilterFactory versus ICUTokenizerFactory and
ICUFoldingFilterFactory
I don't know enough to prefer UnicodeNormalizationFilterFactory versus ICUTokenizerFactory, but generally would like to keep up with
the Blacklight community generally. If someone has run
UnicodeNormalizationFilterFactory with Solr 3.1, that would probably
be the easiest for me.
I am indexing data both from a MARC .mrc export from Voyager along
with other data from other cataloging systems (which is for me the #1
reason I love Blacklight -- it was easy to do). So I'll need SolrMarc
and plain-old XML paths to index data.
It would require a fairly involved rewrite of the UnicodeNormalizationFilter to get it to work with the newer version of Lucene in Solr.
I strongly recommend going to the ICU stuff - you'll get top notch support from the Lucene community should it not live up to your needs.
How about someone take some of your non-English text examples, and run them through Solr's analysis.jsp view using the UnicodeNormalizationFilter and then also run it through a Solr 3.x ICU configured analyzer and see what the diffs, if any, are?
Michael - why go to 3.1 when 3.3 is now the latest? Just jump there. Use the ICU stuff. Then see if any users complain :)
> In 2008 I had acquired an older version of > UnicodeNormalizationFilterFactory.jar directly from Robert Haschart > and was using that (source code was dated around 2008-06-30) and I > have continued to use that with 1.4.1. Now moving to Solr 3.1 and > have tried the older version of UnicodeNormalizationFilterFactory.jar > and a newer one I acquired from here along with normalizer.jar: > https://github.com/projectblacklight/blacklight-jetty/tree/master/sol... > ...I can start the Solr admin app but when I try to do any query I see > this error:
> java.lang.AbstractMethodError: > org.apache.lucene.analysis.TokenStream.incrementToken()Z > at > org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTok enFilter.java: > 48) > at > org.apache.solr.analysis.WordDelimiterFilter.incrementToken(WordDelimiterFi lter.java: > 338) > at > org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.j ava: > 60) > at > org.apache.lucene.analysis.KeywordMarkerFilter.incrementToken(KeywordMarker Filter.java: > 73) > at > org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(SnowballF ilter.java: > 76) > ...
> At the same time I see there has been quite a bit of discussion of > UnicodeNormalizationFilterFactory versus ICUTokenizerFactory and > ICUFoldingFilterFactory
> I don't know enough to prefer UnicodeNormalizationFilterFactory > versus ICUTokenizerFactory, but generally would like to keep up with > the Blacklight community generally. If someone has run > UnicodeNormalizationFilterFactory with Solr 3.1, that would probably > be the easiest for me.
> I am indexing data both from a MARC .mrc export from Voyager along > with other data from other cataloging systems (which is for me the #1 > reason I love Blacklight -- it was easy to do). So I'll need SolrMarc > and plain-old XML paths to index data.
> Thanks in advance for any help!
> -- > You received this message because you are subscribed to the Google Groups "Blacklight Development" group. > To post to this group, send email to blacklight-development@googlegroups.com. > To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
I'd echo Erik's comments -- go with ICU. One of the hang-ups I ran into in preparing a blacklight-jetty running Solr 3.x was trying to determine if if there are significant differences in the normalized output between UnicodeNormalizationFilterFactory and the ICU filters. If you find anything so if you find anything, I'd like to know about it.
> It would require a fairly involved rewrite of the UnicodeNormalizationFilter to get it to work with the newer version of Lucene in Solr.
> I strongly recommend going to the ICU stuff - you'll get top notch support from the Lucene community should it not live up to your needs.
> How about someone take some of your non-English text examples, and run them through Solr's analysis.jsp view using the UnicodeNormalizationFilter and then also run it through a Solr 3.x ICU configured analyzer and see what the diffs, if any, are?
> Michael - why go to 3.1 when 3.3 is now the latest? Just jump there. Use the ICU stuff. Then see if any users complain :)
>> In 2008 I had acquired an older version of >> UnicodeNormalizationFilterFactory.jar directly from Robert Haschart >> and was using that (source code was dated around 2008-06-30) and I >> have continued to use that with 1.4.1. Now moving to Solr 3.1 and >> have tried the older version of UnicodeNormalizationFilterFactory.jar >> and a newer one I acquired from here along with normalizer.jar: >> https://github.com/projectblacklight/blacklight-jetty/tree/master/sol... >> ...I can start the Solr admin app but when I try to do any query I see >> this error:
>> java.lang.AbstractMethodError: >> org.apache.lucene.analysis.TokenStream.incrementToken()Z >> at >> org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTok enFilter.java: >> 48) >> at >> org.apache.solr.analysis.WordDelimiterFilter.incrementToken(WordDelimiterFi lter.java: >> 338) >> at >> org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.j ava: >> 60) >> at >> org.apache.lucene.analysis.KeywordMarkerFilter.incrementToken(KeywordMarker Filter.java: >> 73) >> at >> org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(SnowballF ilter.java: >> 76) >> ...
>> At the same time I see there has been quite a bit of discussion of >> UnicodeNormalizationFilterFactory versus ICUTokenizerFactory and >> ICUFoldingFilterFactory
>> I don't know enough to prefer UnicodeNormalizationFilterFactory >> versus ICUTokenizerFactory, but generally would like to keep up with >> the Blacklight community generally. If someone has run >> UnicodeNormalizationFilterFactory with Solr 3.1, that would probably >> be the easiest for me.
>> I am indexing data both from a MARC .mrc export from Voyager along >> with other data from other cataloging systems (which is for me the #1 >> reason I love Blacklight -- it was easy to do). So I'll need SolrMarc >> and plain-old XML paths to index data.
>> Thanks in advance for any help!
>> -- >> You received this message because you are subscribed to the Google Groups "Blacklight Development" group. >> To post to this group, send email to blacklight-development@googlegroups.com. >> To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "Blacklight Development" group. > To post to this group, send email to blacklight-development@googlegroups.com. > To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
Thank you very much for your prompt responses. Sounds quite clear:
ICU here we come.
Re 3.1 versus 3.3, we just haven't kept up since May, but we will.
I've been pretty quiet on the listserv but we have rolled out an
internal Blacklight implementation at USHMM and are working on a plan
to roll out a version for the web. I'll keep the list updated when we
get close to rolling it out.
Even though I don't believe that rewriting the UnicodeNormalizationFilter code would be a major effort, since it is mostly a boilerplate token filter factory that calls functions from the ICU libraries to do the actual work, I still think it is probably time to retire the UnicodeNormalizationFilter code, in favor of the solr.ICUFoldingFilterFactory code that is in Solr 3.1. The UnicodeNormalizationFilter was only written because the previously existing filter for processing accented characters ISOLatin1AccentFilterFactory was abysmally bad.
Now that a supported filter is available that uses the ICU libraries is available, the filter pro tem: UnicodeNormalizationFilter should be retired and replaced.
I believe that removing the two jar files (normalizer.jar and UnicodeNormalizeFilter.jar) from the lib directory and replacing the line(s) in schema.xml <filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j" composed="false" remove_diacritics="true" remove_modifiers="true" fold="true"/> with <filter class="solr.ICUFoldingFilterFactory" />
should achieve largely the same results. (I think you'll need apache-solr-analysis-extras.3.x.jar lucene-icu-3.x.jar and icu4j-4_6.jar in the solr lib directory)
Chris Beer wrote: >I'd echo Erik's comments -- go with ICU. One of the hang-ups I ran into in preparing a blacklight-jetty running Solr 3.x was trying to determine if if there are significant differences in the normalized output between UnicodeNormalizationFilterFactory and the ICU filters. If you find anything so if you find anything, I'd like to know about it.
>Chris
>On Aug 29, 2011, at 3:16 PM, Erik Hatcher wrote:
>>It would require a fairly involved rewrite of the UnicodeNormalizationFilter to get it to work with the newer version of Lucene in Solr.
>>I strongly recommend going to the ICU stuff - you'll get top notch support from the Lucene community should it not live up to your needs.
>>How about someone take some of your non-English text examples, and run them through Solr's analysis.jsp view using the UnicodeNormalizationFilter and then also run it through a Solr 3.x ICU configured analyzer and see what the diffs, if any, are?
>>Michael - why go to 3.1 when 3.3 is now the latest? Just jump there. Use the ICU stuff. Then see if any users complain :)
>>>In 2008 I had acquired an older version of >>>UnicodeNormalizationFilterFactory.jar directly from Robert Haschart >>>and was using that (source code was dated around 2008-06-30) and I >>>have continued to use that with 1.4.1. Now moving to Solr 3.1 and >>>have tried the older version of UnicodeNormalizationFilterFactory.jar >>>and a newer one I acquired from here along with normalizer.jar: >>>https://github.com/projectblacklight/blacklight-jetty/tree/master/sol... >>>...I can start the Solr admin app but when I try to do any query I see >>>this error:
>>>java.lang.AbstractMethodError: >>>org.apache.lucene.analysis.TokenStream.incrementToken()Z >>> at >>>org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(Filtering TokenFilter.java: >>>48) >>> at >>>org.apache.solr.analysis.WordDelimiterFilter.incrementToken(WordDelimite rFilter.java: >>>338) >>> at >>>org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilte r.java: >>>60) >>> at >>>org.apache.lucene.analysis.KeywordMarkerFilter.incrementToken(KeywordMar kerFilter.java: >>>73) >>> at >>>org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(Snowba llFilter.java: >>>76) >>>...
>>>At the same time I see there has been quite a bit of discussion of >>>UnicodeNormalizationFilterFactory versus ICUTokenizerFactory and >>>ICUFoldingFilterFactory
>>>I don't know enough to prefer UnicodeNormalizationFilterFactory >>>versus ICUTokenizerFactory, but generally would like to keep up with >>>the Blacklight community generally. If someone has run >>>UnicodeNormalizationFilterFactory with Solr 3.1, that would probably >>>be the easiest for me.
>>>I am indexing data both from a MARC .mrc export from Voyager along >>>with other data from other cataloging systems (which is for me the #1 >>>reason I love Blacklight -- it was easy to do). So I'll need SolrMarc >>>and plain-old XML paths to index data.
>>>Thanks in advance for any help!
>>>-- >>>You received this message because you are subscribed to the Google Groups "Blacklight Development" group. >>>To post to this group, send email to blacklight-development@googlegroups.com. >>>To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. >>>For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
>>-- >>You received this message because you are subscribed to the Google Groups "Blacklight Development" group. >>To post to this group, send email to blacklight-development@googlegroups.com. >>To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. >>For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
I pretty much followed this schema (adding in a few mods I'd made
previously):
https://github.com/projectblacklight/blacklight-jetty/blob/master/sol... and I picked up the three jar's mentioned by Bob from here
(icu4j-4_6.jar, apache-solr-analysis-
extras-4.0-2011-03-26_08-06-09.jar, and lucene-analyzers-
icu-4.0-2011-03-26_08-06-09.jar):
https://github.com/projectblacklight/blacklight-jetty/tree/solr-4/sol... it pretty much works. I've done a bit of preliminary testing (for
example, searching for Lodz and for Łódź should return the same
results) which at first glance seems to indicate the two methods
return the same results.
It would seem I'm mixing Solr 3.1 with some 4.0 jars, and I might try
to get other versions, but so far so good.
There's also a Solr 3.3 branch of blacklight-jetty at https://github.com/projectblacklight/blacklight-jetty/tree/solr-3.3 which is probably what you want to use as a reference copy. I believe the outstanding issues with blacklight-jetty using Solr 3.3 were outlined on this list earlier this month.
On Aug 29, 2011, at 5:59 PM, Michael Levy wrote:
I pretty much followed this schema (adding in a few mods I'd made
previously):
https://github.com/projectblacklight/blacklight-jetty/blob/master/sol... and I picked up the three jar's mentioned by Bob from here
(icu4j-4_6.jar, apache-solr-analysis-
extras-4.0-2011-03-26_08-06-09.jar, and lucene-analyzers-
icu-4.0-2011-03-26_08-06-09.jar):
https://github.com/projectblacklight/blacklight-jetty/tree/solr-4/sol... it pretty much works. I've done a bit of preliminary testing (for
example, searching for Lodz and for Łódź should return the same
results) which at first glance seems to indicate the two methods
return the same results.
It would seem I'm mixing Solr 3.1 with some 4.0 jars, and I might try
to get other versions, but so far so good.
Again, thanks to all.
--
You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
To post to this group, send email to blacklight-development@googlegroups.com.
To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
Thanks Bob, it's great to hear they are largely compatible with each other.
Just as a reminder, Tom raised some issues with using CJK and the ICUTokenizer on this list earlier [1] that we should probably keep in mind for documenting our future use of the ICU packages.
On Aug 29, 2011, at 5:48 PM, Robert Haschart wrote:
Even though I don't believe that rewriting the UnicodeNormalizationFilter code would be a major effort, since it is mostly a boilerplate token filter factory that calls functions from the ICU libraries to do the actual work, I still think it is probably time to retire the UnicodeNormalizationFilter code, in favor of the solr.ICUFoldingFilterFactory code that is in Solr 3.1. The UnicodeNormalizationFilter was only written because the previously existing filter for processing accented characters ISOLatin1AccentFilterFactory was abysmally bad.
Now that a supported filter is available that uses the ICU libraries is available, the filter pro tem: UnicodeNormalizationFilter should be retired and replaced.
I believe that removing the two jar files (normalizer.jar and UnicodeNormalizeFilter.jar) from the lib directory and replacing the line(s) in schema.xml <filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j" composed="false" remove_diacritics="true" remove_modifiers="true" fold="true"/> with <filter class="solr.ICUFoldingFilterFactory" />
should achieve largely the same results. (I think you'll need apache-solr-analysis-extras.3.x.jar lucene-icu-3.x.jar and icu4j-4_6.jar in the solr lib directory)
-Bob Haschart
Chris Beer wrote:
I'd echo Erik's comments -- go with ICU. One of the hang-ups I ran into in preparing a blacklight-jetty running Solr 3.x was trying to determine if if there are significant differences in the normalized output between UnicodeNormalizationFilterFactory and the ICU filters. If you find anything so if you find anything, I'd like to know about it.
Chris
On Aug 29, 2011, at 3:16 PM, Erik Hatcher wrote:
It would require a fairly involved rewrite of the UnicodeNormalizationFilter to get it to work with the newer version of Lucene in Solr.
I strongly recommend going to the ICU stuff - you'll get top notch support from the Lucene community should it not live up to your needs.
How about someone take some of your non-English text examples, and run them through Solr's analysis.jsp view using the UnicodeNormalizationFilter and then also run it through a Solr 3.x ICU configured analyzer and see what the diffs, if any, are?
Michael - why go to 3.1 when 3.3 is now the latest? Just jump there. Use the ICU stuff. Then see if any users complain :)
In 2008 I had acquired an older version of UnicodeNormalizationFilterFactory.jar directly from Robert Haschart and was using that (source code was dated around 2008-06-30) and I have continued to use that with 1.4.1. Now moving to Solr 3.1 and have tried the older version of UnicodeNormalizationFilterFactory.jar and a newer one I acquired from here along with normalizer.jar: https://github.com/projectblacklight/blacklight-jetty/tree/master/sol... ...I can start the Solr admin app but when I try to do any query I see this error:
java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTok enFilter.java: 48) at org.apache.solr.analysis.WordDelimiterFilter.incrementToken(WordDelimiterFi lter.java: 338) at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.j ava: 60) at org.apache.lucene.analysis.KeywordMarkerFilter.incrementToken(KeywordMarker Filter.java: 73) at org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(SnowballF ilter.java: 76) ...
At the same time I see there has been quite a bit of discussion of UnicodeNormalizationFilterFactory versus ICUTokenizerFactory and ICUFoldingFilterFactory
I don't know enough to prefer UnicodeNormalizationFilterFactory versus ICUTokenizerFactory, but generally would like to keep up with the Blacklight community generally. If someone has run UnicodeNormalizationFilterFactory with Solr 3.1, that would probably be the easiest for me.
I am indexing data both from a MARC .mrc export from Voyager along with other data from other cataloging systems (which is for me the #1 reason I love Blacklight -- it was easy to do). So I'll need SolrMarc and plain-old XML paths to index data.
Thanks in advance for any help!
-- You received this message because you are subscribed to the Google Groups "Blacklight Development" group. To post to this group, send email to blacklight-development@googlegroups.com<mailto:blacklight-development@googl egroups.com>. To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com<mailto:blacklight-devel opment+unsubscribe@googlegroups.com>. For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
-- You received this message because you are subscribed to the Google Groups "Blacklight Development" group. To post to this group, send email to blacklight-development@googlegroups.com<mailto:blacklight-development@googl egroups.com>. To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com<mailto:blacklight-devel opment+unsubscribe@googlegroups.com>. For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
-- You received this message because you are subscribed to the Google Groups "Blacklight Development" group. To post to this group, send email to blacklight-development@googlegroups.com<mailto:blacklight-development@googl egroups.com>. To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com<mailto:blacklight-devel opment+unsubscribe@googlegroups.com>. For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
As for the effort involved - Lucene's analysis API's changed a fair bit since 2.x and thus why I made that comment. It's trickier stuff under the covers than ever before, to achieve reusable token streams and leverage "attributes" and so on. Certainly not a major undertaking, but hopefully an unnecessary one since the new ICU filters should do the trick.
Bob - thanks for your efforts with this normalization stuff over the years. Your contributions/feedback to the Lucene project factored into these improvements being made part of Lucene itself.
We still have some work to do to tie all this stuff together nicely out of the box with Solr, though. More on that in my next reply.
Erik
On Aug 29, 2011, at 17:48 , Robert Haschart wrote:
> Even though I don't believe that rewriting the UnicodeNormalizationFilter code would be a major effort, since it is mostly a boilerplate token filter factory that calls functions from the ICU libraries to do the actual work, I still think it is probably time to retire the UnicodeNormalizationFilter code, in favor of the solr.ICUFoldingFilterFactory code that is in Solr 3.1. The UnicodeNormalizationFilter was only written because the previously existing filter for processing accented characters ISOLatin1AccentFilterFactory was abysmally bad.
> Now that a supported filter is available that uses the ICU libraries is available, the filter pro tem: UnicodeNormalizationFilter should be retired and replaced.
> I believe that removing the two jar files (normalizer.jar and UnicodeNormalizeFilter.jar) from the lib directory and replacing the line(s) in schema.xml > <filter class="schema.UnicodeNormalizationFilterFactory" version="icu4j" composed="false" remove_diacritics="true" remove_modifiers="true" fold="true"/> > with > <filter class="solr.ICUFoldingFilterFactory" />
> should achieve largely the same results. (I think you'll need apache-solr-analysis-extras.3.x.jar lucene-icu-3.x.jar and icu4j-4_6.jar in the solr lib directory)
> -Bob Haschart
> Chris Beer wrote: >> I'd echo Erik's comments -- go with ICU. One of the hang-ups I ran into in preparing a blacklight-jetty running Solr 3.x was trying to determine if if there are significant differences in the normalized output between UnicodeNormalizationFilterFactory and the ICU filters. If you find anything so if you find anything, I'd like to know about it.
>> Chris
>> On Aug 29, 2011, at 3:16 PM, Erik Hatcher wrote:
>>> It would require a fairly involved rewrite of the UnicodeNormalizationFilter to get it to work with the newer version of Lucene in Solr.
>>> I strongly recommend going to the ICU stuff - you'll get top notch support from the Lucene community should it not live up to your needs.
>>> How about someone take some of your non-English text examples, and run them through Solr's analysis.jsp view using the UnicodeNormalizationFilter and then also run it through a Solr 3.x ICU configured analyzer and see what the diffs, if any, are?
>>> Michael - why go to 3.1 when 3.3 is now the latest? Just jump there. Use the ICU stuff. Then see if any users complain :)
>>> Erik
>>> On Aug 29, 2011, at 15:06 , Michael Levy wrote:
>>>> Hi all,
>>>> We're moving servers and want to move to Solr 3.1. I am having an >>>> issue using Blacklight and Solr 3.1. There is an existing thread on >>>> the topic:
>>>> In 2008 I had acquired an older version of >>>> UnicodeNormalizationFilterFactory.jar directly from Robert Haschart >>>> and was using that (source code was dated around 2008-06-30) and I >>>> have continued to use that with 1.4.1. Now moving to Solr 3.1 and >>>> have tried the older version of UnicodeNormalizationFilterFactory.jar >>>> and a newer one I acquired from here along with normalizer.jar:
>>>> At the same time I see there has been quite a bit of discussion of >>>> UnicodeNormalizationFilterFactory versus ICUTokenizerFactory and >>>> ICUFoldingFilterFactory
>>>> And I note Chris Beer's work using the ICU approach :
>>>> I don't know enough to prefer UnicodeNormalizationFilterFactory >>>> versus ICUTokenizerFactory, but generally would like to keep up with >>>> the Blacklight community generally. If someone has run >>>> UnicodeNormalizationFilterFactory with Solr 3.1, that would probably >>>> be the easiest for me.
>>>> I am indexing data both from a MARC .mrc export from Voyager along >>>> with other data from other cataloging systems (which is for me the #1 >>>> reason I love Blacklight -- it was easy to do). So I'll need SolrMarc >>>> and plain-old XML paths to index data.
>>>> Thanks in advance for any help!
>>>> -- >>>> You received this message because you are subscribed to the Google Groups "Blacklight Development" group. >>>> To post to this group, send email to >>>> blacklight-development@googlegroups.com >>>> . >>>> To unsubscribe from this group, send email to >>>> blacklight-development+unsubscribe@googlegroups.com >>>> . >>>> For more options, visit this group at >>>> http://groups.google.com/group/blacklight-development?hl=en >>>> .
>>> -- >>> You received this message because you are subscribed to the Google Groups "Blacklight Development" group. >>> To post to this group, send email to >>> blacklight-development@googlegroups.com >>> . >>> To unsubscribe from this group, send email to >>> blacklight-development+unsubscribe@googlegroups.com >>> . >>> For more options, visit this group at >>> http://groups.google.com/group/blacklight-development?hl=en >>> .
> -- > You received this message because you are subscribed to the Google Groups "Blacklight Development" group. > To post to this group, send email to blacklight-development@googlegroups.com. > To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
Don't mix and match Lucene/Solr 3.x with 4.x. Very different stuff under the covers and results could be bad.
Solr (3.3 for example here) ships with apache-solr-analysis-extras-3.3.0.jar in the dist/ directory of the binary distro. This JAR file contains the Solr "factories" to wire Solr to the underlying Lucene libraries.
As Bob mentioned, you'll also need a couple of additional JAR files. These can be found in a binary distribution of Lucene (again, using 3.3 as an example), under contrib/icu. There's lucene-icu-3.3.0.jar (the actual analyzers that the above factories instantiate) and lib/icu4j-4_8.jar.
I strongly recommend keeping versions in sync. Solr and Lucene are versioned identically now, so just stick with the same 3_x release (again, 3.3 is recommended at this point) for both sides of things.
> I pretty much followed this schema (adding in a few mods I'd made > previously): > https://github.com/projectblacklight/blacklight-jetty/blob/master/sol... > and I picked up the three jar's mentioned by Bob from here > (icu4j-4_6.jar, apache-solr-analysis- > extras-4.0-2011-03-26_08-06-09.jar, and lucene-analyzers- > icu-4.0-2011-03-26_08-06-09.jar): > https://github.com/projectblacklight/blacklight-jetty/tree/solr-4/sol... > it pretty much works. I've done a bit of preliminary testing (for > example, searching for Lodz and for Łódź should return the same > results) which at first glance seems to indicate the two methods > return the same results.
> It would seem I'm mixing Solr 3.1 with some 4.0 jars, and I might try > to get other versions, but so far so good.
> Again, thanks to all.
> -- > You received this message because you are subscribed to the Google Groups "Blacklight Development" group. > To post to this group, send email to blacklight-development@googlegroups.com. > To unsubscribe from this group, send email to blacklight-development+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.