separate jar for custom indexer

15 views
Skip to first unread message

WillemVermeer

unread,
Jan 8, 2010, 7:33:13 AM1/8/10
to solrmarc-tech
Hi everybody,

I'm using the latest version of solrmarc by building it from the
trunk. I have established out that I will need to have my own custom
indexer (to define a custom id for each record) and I'm a bit confused
on how to set this up.
I read the thread called "Making Configuration of SolrMarc easier"
where it's mentioned that it will be possible to create a separate jar
with custom built code in it, what is the status of it and how to
configure it? I tried adding my compiled Indexer.class to the
Custom_SolrMarc.jar but this results in errors.

Thanks for advice,
Willem
The European Library
Den Haag, The Netherlands.

Robert Haschart

unread,
Jan 8, 2010, 1:20:55 PM1/8/10
to solrma...@googlegroups.com
Willem,

I have been hard at work making SolrMarc 2.1 which (except for finishing
the documentation) is ready to go. It currently resides in the SolrMarc
SVN repo at branches/solrmarc-2.1 but will soon be moved to replace the
trunk. One of the main improvements of version 2.1 over version 2.0
(which is the current trunk) is that it is easier to create custom java
indexing routines, and it is much easier to use them.

The initial step of:
ant init
will create a directory named local_build within this created
directory there is a src directory, in which you can place the java
files for your custom indexing routines.

Subsequently if you run:
ant dist
it will compile your source into a separate jar file and modify the
config.properties file to reference this jar, and your created class,
and place the resulting files in the dist directory.

The config.properties required for SolrMarc 2.1 are largely the same as
what had been used in SolrMarc 2.0 (and earlier) and the
index.properties file are unchanged.

If you'd like to be an early adopter, I can help you get the new branch
up and running as quickly as possible, partly because doing so will then
make it easier to make the documentation as complete as possible.

-Bob Haschart

Willem Vermeer

unread,
Jan 8, 2010, 1:38:03 PM1/8/10
to solrma...@googlegroups.com

Sounds great Bob,
I'll give it a go on Monday and let you know about the results. I'm currently in prototype phase of our project anyway.
Best,
Willem

Op 8 jan 2010 19:20 schreef "Robert Haschart" <rh...@virginia.edu>:



Willem,

I have been hard at work making SolrMarc 2.1 which (except for finishing the documentation) is ready to go.  It currently resides in the SolrMarc SVN repo at branches/solrmarc-2.1 but will soon be moved to replace the trunk.   One of the main improvements of version 2.1 over version 2.0 (which is the current trunk)  is that it is easier to create custom java indexing routines, and it is much easier to use them.

The initial step of:     ant init   will create a directory named   local_build   within this created directory there is a src directory, in which you can place the java files for your custom indexing routines.

Subsequently if you run:
  ant dist
it will compile your source into a separate jar file and modify the config.properties file to reference this jar, and your created class, and place the resulting files in the dist directory.

The config.properties required for SolrMarc 2.1 are largely the same as what had been used in SolrMarc 2.0 (and earlier)  and the index.properties file are unchanged.

If you'd like to be an early adopter, I can help you get the new branch up and running as quickly as possible, partly because doing so will then make it easier to make the documentation as complete as possible.

-Bob Haschart

WillemVermeer wrote: > Hi everybody, > > I'm using the latest version of solrmarc by building ...


--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.



Willem Vermeer

unread,
Jan 11, 2010, 3:58:27 AM1/11/10
to solrma...@googlegroups.com
Hi Bob,
So this morning I checked out the 2.1 branch. Below are the notes I
took during the migration. I'm afraid I didn't make it all the way to
the end: the custom indexer jar gets built but I can't seem to execute
it. I guess I'm missing something so I've left it for the moment
hoping you can help me out!
Thanks,
Willem

============================================
Migration notes SOLRMARC 2.0 to 2.1

These are my notes of the steps I had to take to upgrade a working 2.0
version of solrmarc to the latest version 2.1.

1. check out the 2.1 branch from svn
>> cd <my dev dir>
>> scn co http://solrmarc.googlecode.com/svn/branches/solrmarc-2.1/ solrmarc21

2. run ant init
>> cd solrmarc21
>> ant init
This creates a subdirectory called local_build which contains all the
local configuration files as well as any custom java code. Previously
ant init would create a subdirectory using your custom prefix for its
subdirectory name. In 2.1 the subdirectory is always called
local_build and the config files in it are prefixed with the custom
prefix.
During this initialization procedure I selected [none] for example
configuration as we will use the base version of solrmarc only; our
site-specific prefix; left the heap size to its default; set encoding
to BESTGUESS (as our source may contain either UNIMARC or MARC8);
selected a custom solr configuration; entered the URL of our SOLR
server; the full path to the solr home directory contaning the conf
subdirectory and finally the full path to the solr war file location.

3. add custom indexer
>> cd local_build
>> cd src
>> mkdir -p src/org/solrmarc/index
>> cp <dev location>/org/solrmarc/index/MyIndexer.java src/org/solrmarc/index

4. modify indexer properties
I basically reused the indexer property file from the old installation
>> cp <solrmarc 2.0>/<my site prefix>/custom_index.properties local_build/<my site prefix>_index.properties

4. run ant dist
>> cd ..
>> ant dist
Running this target requires no manual intervention. It creates a
(top-level) dist directory containing the general SolrMarc.jar and the
site specific MyIndexer.jar as well as the site specific indexing
properties

5. installation is ready
Solrmarc has detected that the local_build/src directory contained a
custom indexer and has set the properties solrmarc.custom.jar.path and
solr.indexer in <site-prefix>_config.properties accordingly.

6. start indexing
Previously the indexing scripts lived in the dist directory but now
they have been moved to local_build/script_templates. There is also an
index_scripts directory which contains a small README_SCRIPTS
suggesting to put the custom index scripts into that directory.
When trying to run the script_templates/indexfile script I first
needed to add execution permission to the scipt file:
>> chmod +x script_templates/indexfile
Then I could run indexfile from out of the local_build directory:
>> cd <solrmarc21 home>/local_build
>> script_templates/indexfile my_marc_file.mrc
except that it immediately crashes with the error message:
Exception in thread "main" java.lang.NoClassDefFoundError: @MEM_ARGS@
So I first copied the template to the index_scripts directory:
>> cp script_templates/indexfile index_scripts
and then replaced the unfound reference to @MEM_ARGS@ by a hardcoded
value -Xmx256m, just to get things going. However then it complains it
cannot find the SolrMarc.jar. So I tried to run indexfile from out of
the dist directory:
>> cd <solrmarc21>/dist
>> cp <solrmarc21>/local_build/index_scripts/indexfile index_scripts
>> index_scripts/indexfile my_marc_file.mrc
The SolrMarc indexer can now be found but my custom indexer can't.

Robert Haschart

unread,
Jan 11, 2010, 12:17:45 PM1/11/10
to solrma...@googlegroups.com
Everything looked like it was going well until step 6.   Based on some comments from other Solrmarc users, I decided that the scripts ought to be placed in a sub directory (named bin)  beneath the dist directory.  The ant build process (for target dist) should sopy the scripts from the local_build/script_templates directory, to the dist/bin directory, and in the process do a substitution for the @MEM_ARGS@ string that gave you trouble,  and perform a chmod +x and ensure that the encoding of the script is correct for the linux platform (in terms of line ending  CR instead of CR-LF)  

It seems that you have manually fixed at least two of these, but it might well be worth simply trying to run the version of the script from the dist/bin directory.

Additionally I will try a similar installation here, to see whether I can see what the problem might be.

-Bob Haschart

Willem Vermeer

unread,
Jan 11, 2010, 2:55:01 PM1/11/10
to solrma...@googlegroups.com
Thanks for the reply but running the script in dist/bin still gives me
the same error result:
./indexfile ~/kb/arrow/testdata/tel117_utf8.mrc
INFO [main] (MarcImporter.java:637) - Starting SolrMarc indexing.
INFO [main] (Utils.java:188) - Opening file:
/Users/willem/dev/solrmarc21/dist/arrow_config.properties
INFO [main] (MarcHandler.java:286) - Attempting to open data file:
/Users/willem/kb/arrow/testdata/tel117_utf8.mrc
ERROR [main] (MarcHandler.java:429) - Cannot load class:
org.solrmarc.index.ArrowIndexer
ERROR [main] (MarcHandler.java:438) - Cannot find custom indexer class
named: org.solrmarc.index.ArrowIndexer
ERROR [main] (MarcHandler.java:439) - Jar file containing that class
MUST be referenced via the property: solrmarc.custom.jar.path
ERROR [main] (MarcHandler.java:440) - Please define this property in
your config.properties file
ERROR [main] (MarcImporter.java:647) - Error configuring Indexer from
properties file. Exiting...
Error configuring Indexer from properties file. Exiting...

Relevant section of my arrow_config.properties:
solrmarc.solr.war.path=/Users/willem/dev/apache-tomcat-6.0.18/webapps/apache-solr-1.4-dev.war

# solrmarc.custom.jar.path - Jar containing custom java code to use in
indexing.
# If solr.indexer below is defined (other than the default of
org.solrmarc.index.SolrIndexer)
# you MUST define this value to be the Jar containing the class listed there.
solrmarc.custom.jar.path=ArrowIndexer

# Path to your solr instance
solr.path = /Users/willem/kb/solrarrow

# - solr.indexer - full name of java class with custom indexing functions. This
# class must extend SolrIndexer; Defaults to SolrIndexer.
solr.indexer = org.solrmarc.index.ArrowIndexer

# - solr.indexer.properties -indicates how to populate Solr index fields from
# marc data. This is the core configuration file for solrmarc.
solr.indexer.properties = arrow_index.properties

In /dist I have the file ArrowIndexer.jar with the following contents:
<[Mon Jan 11 20:50:52][willem@/Users/willem/dev/solrmarc21/dist]>jar
tf ArrowIndexer.jar
META-INF/
META-INF/MANIFEST.MF
org/
org/solrmarc/
org/solrmarc/index/
org/solrmarc/index/ArrowIndexer.class

My indexer class:
public class ArrowIndexer extends SolrIndexer {
public ArrowIndexer(String indexingPropsFile, String propertyDirs[]) {
super(indexingPropsFile, propertyDirs);
System.out.println("[ARROW]ArrowIndexer instantiated");
}

public String getTelid() {
System.out.println("[ARROW]getTelid invoked");
return "id" + System.currentTimeMillis();
}
}

I don't see any reference to the custom jar in the indexfile script. I
guess I'm still missing something. If you want I can try to run the
whole again from scratch.

Thanks again,
Willem

Robert Haschart

unread,
Jan 11, 2010, 3:03:24 PM1/11/10
to solrma...@googlegroups.com
Ah-ha,

In the line:

solrmarc.custom.jar.path=ArrowIndexer

it needs the .jar at the end. thusly:

solrmarc.custom.jar.path=ArrowIndexer.jar


This is likely a problem in the new build script where it fills the
necessary value in the config file as the config file is being created.
But simply changing that value should get things working for you.

-Bob Haschart

Willem Vermeer

unread,
Jan 11, 2010, 3:18:55 PM1/11/10
to solrma...@googlegroups.com
Yep! That was it, great!
I can also confirm that the problems with the chmod +x and MEMARGS
went away by using the script in dist/bin.

Now moving right along to the next problem :-)

In my arrow_index.properties I have:
id = custom, getTelid

and my custom indexer looks like:


public class ArrowIndexer extends SolrIndexer {
public ArrowIndexer(String indexingPropsFile, String propertyDirs[]) {
super(indexingPropsFile, propertyDirs);
System.out.println("[ARROW]ArrowIndexer instantiated");
}

public String getTelid(Record record) {


System.out.println("[ARROW]getTelid invoked");
return "id" + System.currentTimeMillis();
}
}

but when running it I get:

<[Mon Jan 11 21:16:47][willem@/Users/willem/dev/solrmarc21/dist/bin]>./indexfile


~/kb/arrow/testdata/tel117_utf8.mrc
INFO [main] (MarcImporter.java:637) - Starting SolrMarc indexing.
INFO [main] (Utils.java:188) - Opening file:
/Users/willem/dev/solrmarc21/dist/arrow_config.properties
INFO [main] (MarcHandler.java:286) - Attempting to open data file:
/Users/willem/kb/arrow/testdata/tel117_utf8.mrc

ERROR [main] (SolrIndexer.java:347) - Unable to find custom indexing
function getTelid
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at org.solrmarc.marc.MarcHandler.loadIndexer(MarcHandler.java:449)
at org.solrmarc.marc.MarcHandler.init(MarcHandler.java:103)
at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:643)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at com.simontuffs.onejar.Boot.run(Boot.java:334)
at com.simontuffs.onejar.Boot.main(Boot.java:170)
Caused by: java.lang.IllegalArgumentException: Unable to find custom
indexing function getTelid
at org.solrmarc.index.SolrIndexer.verifyCustomMethodExists(SolrIndexer.java:349)
at org.solrmarc.index.SolrIndexer.verifyCustomMethodsAndTransMaps(SolrIndexer.java:282)
at org.solrmarc.index.SolrIndexer.fillMapFromProperties(SolrIndexer.java:265)
at org.solrmarc.index.SolrIndexer.<init>(SolrIndexer.java:101)
at org.solrmarc.index.ArrowIndexer.<init>(ArrowIndexer.java:11)
... 13 more
ERROR [main] (MarcHandler.java:471) - Unable to load Custom indexer:
org.solrmarc.index.ArrowIndexer


ERROR [main] (MarcImporter.java:647) - Error configuring Indexer from
properties file. Exiting...
Error configuring Indexer from properties file. Exiting...

Is it allowed to define a custom function for the 'id' field? Should
the name of the method match exactly how it's referred to in
_index.properties?

Thanks again,
Willem

Willem Vermeer

unread,
Jan 11, 2010, 4:03:27 PM1/11/10
to solrma...@googlegroups.com
Success! I was missing an import of org.marc4j.marc.Record. After
adding this import it's now working correctly. Thanks for your help
Bob.
Willem

Robert Haschart

unread,
Jan 11, 2010, 4:14:59 PM1/11/10
to solrma...@googlegroups.com
Congratulations,

If you have any more questions as you move forward, please feel free to
contact the solrmarc group. Your projects plans to handle both USMarc
and Unimarc records is very interesting.

-Bob Haschart

Willem Vermeer

unread,
Jan 12, 2010, 2:35:36 AM1/12/10
to solrma...@googlegroups.com
We'll keep you posted on our progress!

Just to wrap this thread up I have updated my migration notes and it
now describes what to do to use the 2.1 branch with a custom indexer
function. Feel free to use it to your liking.

=========================================================
Migration notes SOLRMARC 2.0 to 2.1 with a custom built indexer

Running an ant dist will copy these scripts to /dist/bin. It is
suggested to run the index scripts from out of that directory
/dist/bin. Before we can start the indexing we need to fix one minor
bug:
At the time of writing there was one bug in the generation of the
properties files: in custom_index.properties you need to add .jar to
the name of the custom jar file, i.e. replace:
solrmarc.custom.jar.path=MyIndexer
by:
solrmarc.custom.jar.path=MyIndexer.jar
This will probably fixed in the final 2.1 version.

Now we can start the indexing:
>> cd <devlocation>/dist/bin
>> ./indexfile my_marc.mrc

One note about the custom indexer: the example provided on the
ConfiguringSolrMarc page misses an import which is needed to
succesfully compile the indexer, it should read:

import org.solrmarc.index.SolrIndexer;
import org.marc4j.marc.Record;

public class BlacklightIndexer extends SolrIndexer
{

public BlacklightIndexer(String propertiesMapFile, String propertyPaths[])
{
super(propertiesMapFile, propertyPaths);
}

public Set<String> getRecordingAndScore(Record record)
{
Set<String> result = new LinkedHashSet<String>();
// content omitted, see ConfiguringSolrMarc page
return result;
}
}

During the compilation phase in ant dist somehow the import is
resolved by adding marc4j.jar to the classpath, as a developer you
need not worry about that.

Reply all
Reply to author
Forward
0 new messages