It may or may not require special configuration of solr, depending on
how you implement it. It will definitely require a tool that can index
TEI files to Solr -- or one of the existing Solr XML indexer tools (I
think one ships with Solr maybe?) configured for TEI and how you'd like
to implement it. Blacklight itself doesn't ship with any special tool
for indexing TEI, as you found out there is no "solr:tei:index" rake task.
There are other people who have been doing TEI in BL, perhaps some of
them will see this and give you some ideas. You could also try searching
the listserv archives, not sure if this has come up before or not (at
first I thought it had, but I think I was confusing it with something
else).
Jonathan
When I was working at UVA we indexed TEI into Blacklight, so that might be where you saw a reference to it. There isn't anything pre-written, because TEI tends to be pretty free form. One of the fields for TEI was a url so that the blacklight entry could give the user a link to our actual TEI presentation application (XTF).
These, for example, are all TEI documents in Blacklight: http://search.lib.virginia.edu/?f%5Bdigital_collection_facet%5D%5B%5D=UVa+Text+Collection&sort=date_received_facet+desc
I can't find a code example of indexing TEI documents, but you might want to take a look at some examples of indexing other arbitrary XML. Take a look at the code for the Northwest Digital Archives, for example:
http://github.com/bess/northwest-digital-archives/
especially in lib/nwda-lib
In summary, it's quite possible but you have to write it yourself. But other people have done it and would be happy to answer questions I'm sure.
Bess
> --
> You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
> To post to this group, send email to blacklight-...@googlegroups.com.
> To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
>
class SitemapController < ApplicationController
def index
all_query = { 'id:' => '*', 'rows' => '50000'}all_docs = Blacklight.solr.find(all_query)document_list = all_docs.docs.collect {|doc| SolrDocument.new(doc)}(@response, @document_list) = [all_docs, document_list]respond_to do |format|format.xml { render :layout => false }endend
end
Bill Parod
Some people thought that you might run into performance problems paging
through ALL the rows in the db with an ordinary Solr find -- Solr isn't
really very good at this kind of 'deep paging', it's been kind of
optimized against it. But you haven't run into problems? How many docs
do you have in your solr? How long does it take to generate? It was
suggested that if there IS a probelm with this, then instead of using an
ordinary solr query, you should use the Solr 'terms' component on the
'id' field.
You are hard-coding "http://hostname/catalog/<%= document[:id] %>" as
your URL. That's first of all wrong because shouldn't it be your actual
hostname instead of the hard-coded string 'hostname'? And secondly
wrong because you should ideally be using Rails routing methods to
generate this URL, so it'll be right even if the local app changes the
routing. One way to do that is with: <%= catalog_index_url(
document[:id] ) %>
I think having the sitemap/_document_list partial is overkill in this
case, it'll be simpler and easier to understand with just ONE template.
However, I wonder about using the Rails controller system for this at
all -- what you've done will generate the sitemap dynamically every
time a certain URL is hit. But this is a very expensive operation,
makes more sense to write it to disk as a static file every once in a
while (say nightly). So I woudln't use a Rails controller at all, I'd
just write some ruby code that gets triggered by a rake task to generate
the static file(s). (Cause another thing is you need to check to make
sure you're not over the maximum number of lines or bytes for a
sitemap.xml file, and if you are, split it into multiple ones with a
master one referencing them all, as per the sitemap spec). You _could_
use ERB templates in a a non-rails-controller ordinary ruby file like
this, if you needed to, but this is simple enough you might not even
need an ERB template, just generating strings might be enough.
Outside of a Rails controller, my earlier advice to use Rails routing
might be harder to follow, if neccesary it could be hardcoded instead of
using Rails routing, but then the "prefix" (http://something/catalog/)
should be in a parameter, rather than hard-coded, so it's easy to
change, and so the code can be shared.
Hmm, I guess that was a whole bunch of advice/critique, in the end, I
think you probably ought to be doing things somewhat differently. Hope
it makes sense and is helpful.
Jonathan
> bill-...@northwestern.edu<mailto:bill-...@northwestern.edu>
> 847 491 5368
http://www.treyconnell.com/rails-restful-helpers-rake-tasks
But where you're going to run into trouble there is with the need to
split a sitemap into multiple parts if it's too large, I think the Rails
controller/view system is going to make that needlessly complex if
possible at all, and you'll be better off just writing ruby code in a
rake task, not using rails controller.
Bill Parod
> To post to this group, send email to blacklight-...@googlegroups.com<mailto:blacklight-...@googlegroups.com>.
> To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com<mailto:blacklight-develo...@googlegroups.com>.
> For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Blacklight Development" group.
> To post to this group, send email to blacklight-...@googlegroups.com<mailto:blacklight-...@googlegroups.com>.
> To unsubscribe from this group, send email to blacklight-develo...@googlegroups.com<mailto:blacklight-develo...@googlegroups.com>.
> For more options, visit this group at http://groups.google.com/group/blacklight-development?hl=en.
>
>
>
> Bill Parod
>
> Library Technology Division - Enterprise Systems
> Northwestern University Library
The FAQ page at http://projectblacklight.org/?page_id=3 states:
"Currently, Blacklight can index, search, and provide faceted browsing
for MaRC records and several kinds of XML documents, including TEI,
EAD, and GDMS."
This is perhaps ambiguous but appears to suggest that Blacklight
directly handles TEI, among other formats.
--
Michael Slone
Yes, I think that's misleading.
But I still don't have any access to editing the WP.
I think there's actually a buncha mis-leading stuff in the FAQ.
Bess, how's the project to transfer this to GitHub Pages going? In the
meantime, can someone give me access to edit the WP? I'm going to edit
the heck out of it though, so it might be better to have the revision
control we'll have when it's in git, but it seems like that could be a
while.
I don't think we do ourselves or anyone else any favors by
over-promissing what BL currently easily provides out of the box, and I
think the WP pages do that in several places.
Jonathan