In the interests of reducing my ongoing work for Stanford's SearchWorks
index, I have, with Bob Haschart's blessing, forked the SolrMarc code
and made my fork available via the (new) SolrMarc space on github:
http://github.com/solrmarc/stanford-solr-marc
Specifics of how my fork digresses are below.
This is an experiment: I believe my personal efforts will be reduced
by
using this pared down derivative of SolrMarc. I am NOT
committing to supporting all the use cases that Bob supports with
SolrMarc. Bob is doing a great job of juggling VuFind needs, Blacklight needs, UVa needs, less
savvy consumers' needs, and maintaining backward compatibility with
earlier versions of Solr. I cannot make those kinds of commitments on
Stanford's dollar or on my own time.
One goal of
the fork is to simplify the code and the build scripts for development
purposes. This creates a slightly higher expectation of users: they will be presumed to have expertise to do what they
need downstream. (e.g. edit the build.properties file, set up analogous
directories for their local site code and/or their local versions of
Solr, substitute their own java customizations, set
their own version up for bean shell, etc).
If anyone likes what I've done or any part of it, feel free to grab
it, fork it, mimic it or whatever. I am happy to add committers if they write
test code for any changes they want to push up.
I have created hudson builds for the core code and the site specific code in stanford-solr-marc on the
projectblacklight hudson server.
These builds will kick off after each commit to the stanford-solr-marc
github repository, and they create javadoc and test coverage reports
(see the hudson pages below for links to these).
http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20CORE%20code/
http://hudson.projectblacklight.org/hudson/job/stanford-solr-marc%20SITE%20code/
I can add emails to the hudson build notifications, and can probably
figure out how to have github send emails upon commits, if folks desire.
It would be awesome if the fork converges with SolrMarc future
development to the point of re-combining the code base. Meanwhile, as
Bob and I have discussed, this fork may help Bob with some of his
refactoring plans, and I can forge ahead with Stanford specific needs
more easily.
Significant Differences between my fork and the SolrMarc on GoogleCode:
- git
- reorg of the directory structures for clarity and to reduce nesting.
- complete rewrite of the ant builds.
- a single build.xml file
- a single build.properties file -- it should be straightforward to change build.properties as desired.
- the build process does not result in a single jar, but instead creates a
dist directory with all the files and folder structure as needed to
execute the code.
- the wonderful scripts written by Bob are not "localized" by the build process
- strives to use "vanilla" versions of Solr and Marc4j, with version clearly indicated
- the utility class has been refactored into smaller pieces
- the only exemplar site code is Stanford SearchWorks
- functionality not used by Stanford is often stripped out, such as
- bean shell scripting capability (it could be added back in easily, if desired)
- notion of running under windows (could be added back in)
- unused code placeholders, such as z39.50
- embedded solrj update options are not exercised - this code will be stripped out soon
- core tests have been largely rewritten to adhere to junit common
practices: ant calls a junit class which executes the java code and
asserts the correct results.
- current intent is to move away from using java reflection to
simultaneously support multiple versions of Solr -- I will create a
tag/branch for a Solr version if a Solr upgrade isn't backwards
compatible, and I make no promise to keep that branch up to date.
I have not written or rewritten the type of documentation available on the
googlecode SolrMarc wiki - much of that documentation is directly applicable (settings for xxx_config.properties, settings for xxx_index.properties …).
Note that the SITE code for Stanford SearchWorks will lag behind our
actual production code, as the copy of record is *not* the github
repository.
a. avoids commit messages for every commit for local work
b. allows our copy-of-record to be behind the Stanford firewall.
c. I will update the github repository to the current Stanford production code from time to time.
Let me repeat: I'm not promising to keep this project backwards
compatible with older versions of Solr or of xx_index.properties files,
as those progress. The main audience for this codebase is me. Others
are welcome to the code, and will probably be welcomed as committers …
but consumers of this codebase will be presumed to have enough expertise
to do what they need downstream. (e.g. substitute their own java
customizations, or set their own version up for bean shell, or for a
different version of Solr).
There is plenty more to do. Just a few examples:
- More tests of core code
- More refactoring of core code
- Documentation
- Naomi