Question on passing a search term to ResourceSpace

622 views
Skip to first unread message

David Shaw

unread,
Jun 11, 2010, 12:56:55 PM6/11/10
to ResourceSpace
We are currently evaluating ResourceSpace for use as an internal DAM.
We are a museum and have a collections management system with an OPAC.

What we would like to achieve is for our users to search for objects
via our OPAC and then give them the option to click on a hyperlink
that will then search ResourceSpace for all images that match that
object/s. Thankfully all our accession numbers are unique but they
have very diverse structure, for example:

1
PD.123-1901
MS. 214 f(1)
C.95 & A-1997
EC.28 & A & B-1938
499*
MS 547-2000
P.669-1985 (3)

We record the accession number in the IPTC Title field.

We have worked out how to do this on a singular basis with the
following search (eg. for PD.7-1982):

.../resourcespace/pages/search.php?search=title%3APD%2C+title%3A7%2C
+title%3A1982

However we would like to know if it is possible to do this with
multiple terms and with exact matching on field, so that we could kind
of do a Goggle like search for the above of:

"1" "PD.123-1901" "MS. 214 f(1)" ... and so on

This would bring back multiple images from ResourceSpace where the
accession number in the IPTC Title field *exactly* matched several
search terms.

If this is possible then I can see ResourceSpace meeting our needs
very nicely indeed.

TIA - David

Tom Gleason

unread,
Jun 11, 2010, 1:05:44 PM6/11/10
to resour...@googlegroups.com
multiple search terms only narrow results. I wonder if you could do a
script that does a search on each term individually, then return a
special search for the resources found, such as search=!list1:2:3:4 ,
where 1, 2, 3, and 4 are the resource numbers of the search terms
found.

> --
> You received this message because you are subscribed to the Google Groups "ResourceSpace" group.
> To post to this group, send email to resour...@googlegroups.com.
> To unsubscribe from this group, send email to resourcespac...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/resourcespace?hl=en.
>
>

--
Tom Gleason, PHP Developer
DBA Impressive Design

Exploring ResourceSpace at:
http://resourcespace.blogspot.com

David Dwiggins

unread,
Jun 11, 2010, 1:14:25 PM6/11/10
to resour...@googlegroups.com
Hi, David,

I'd love to hear more about what you're planning -- we're a
library/archive/museum, and have done something similar with
ResourceSpace. In our case, though, because of the variability in
accession numbering/identifiers, as well as the fact that some images
can contain more than one object, we're doing the reverse. We actually
store the ResourceSpace resource ID in our cataloging records, and
this allows us to link the cataloging records back to specific
ResourceSpace resources. I've created a simple SOAP gateway that
allows users of the cataloging system (Minisis) to pull in images from
ResourceSpace via resource number.

It would probably also be possible to customize ResourceSpace to do
what you're looking for, although it would take a bit of thought on
the best way to approach it.

In any case, I'd be happy to talk further about how we've done this if
you're interested.

-David Dwiggins
Systems Librarian/Archivist, Historic New England
ddwiggins at historicnewengland dot org
617-994-5948

David Shaw

unread,
Jun 11, 2010, 3:25:18 PM6/11/10
to ResourceSpace
Thanks for the quick response Tom, we were kind of hoping not to have
to go direct to the ResourceSpace mysql database and pre-query it.
We'll have a go doing what you suggest next week though. Is there any
limit to the number of resource numbers we can use in a !list search
string?

Thanks, David

Tom Gleason

unread,
Jun 11, 2010, 3:32:54 PM6/11/10
to resour...@googlegroups.com
I'm not sure...I added the list search recently just because I thought
it might be useful, though I haven't actually used it yet.

I wouldn't necessarily recommend direct to mysql.

This would be a prime example of the use of a new API and a custom API
method, ie, send search terms to a custom api method that uses
ResourceSpaces do_search method, and retrieve a customized search
based on all those results. However, the API doesn't exist yet!

David is probably the person for you to talk to since he's more
familiar with your kind of problem.

David Shaw

unread,
Jun 11, 2010, 4:22:44 PM6/11/10
to ResourceSpace
David,

It's good to hear of the way you're using it to integrate into your
cataloging system. The vast majority of our images contain only one
object, and in fact we generally have many images per object for most
things that are not flat art. As I expect you already appreciate it's
rather difficult making the standard IPTC fields in an image work in a
Museum context, without moving to custom XMP schemas which we want to
avoid. So we are looking at ways to make use of our OPAC which has
many ways to search for museum objects and also benefits from a
hierarchical thesaurus. The common link is the accession number which
we'd like to use to pass to other systems for image retrieval and
distribution.

There is a need internally for a DAM which allows various users within
the museum access to the high-res originals and lower res derivatives,
at the moment our Image Library supplies these so it would save them
time if certain staff could research and retrieve images for
themselves. We need differing levels of user access to these, which
ResourceSpace does well. The other benefits that ResourceSpace brings
will all be useful too. However searching for these objects is best
done using the collections management data and all of that is
available via the OPAC at a level of granularity that we can control
which is far beyond that in the usual Caption, Description, Keywords
found in DAMs.

We did think about putting data in our collections system to link to
ResourceSpace but decided against it as there's the worry of a
corruption or some event that affects the resource ID in ResourceSpace
making all the stored references in the collections management system
useless. Recovering from that would not be pleasant. We also keep
our eyes on the horizon for DAMs that may better suit our needs, and
want an easy migration route should we decide to change. The idea
being that the original images will all have sufficient metadata
(accession number) so that we can easily re-ingest them into another
DAM and link them to object records in the collections management
system.

We have been sitting on our thumbs for a while now waiting for
something to come along that would meet our needs. It's surprising
how few DAMs you can actually get to work in this manner, I guess we
museum users are somewhat special cases when it comes to DAM use
though.

David

On Jun 11, 6:14 pm, David Dwiggins <da...@dwiggins.net> wrote:
> Hi, David,
>
> I'd love to hear more about what you're planning -- we're a
> library/archive/museum, and have done something similar with
> ResourceSpace. In our case, though, because of the variability in
> accession numbering/identifiers, as well as the fact that some images
> can contain more than one object, we're doing the reverse. We actually
> store the ResourceSpace resource ID in our cataloging records, and
> this allows us to link the cataloging records back to specific
> ResourceSpace resources. I've created a simple SOAP gateway that
> allows users of the cataloging system (Minisis) to pull in images from
> ResourceSpace via resource number.
>
> It would probably also be possible to customize ResourceSpace to do
> what you're looking for, although it would take a bit of thought on
> the best way to approach it.
>
> In any case, I'd be happy to talk further about how we've done this if
> you're interested.
>
> -David Dwiggins
> Systems Librarian/Archivist, Historic New England
> ddwiggins at historicnewengland dot org
> 617-994-5948
>
> On Fri, Jun 11, 2010 at 1:05 PM, Tom Gleason <theorysav...@gmail.com> wrote:
> > multiple search terms only narrow results. I wonder if you could do a
> > script that does a search on each term individually, then return a
> > special search for the resources found, such as search=!list1:2:3:4 ,
> > where 1, 2, 3, and 4 are the resource numbers of the search terms
> > found.
>
> >> For more options, visit this group athttp://groups.google.com/group/resourcespace?hl=en.

Tom Gleason

unread,
Jun 11, 2010, 4:40:32 PM6/11/10
to resour...@googlegroups.com
Dwiggins,

Interested to know if any other DAMs show potential for you and why.

Tom

> For more options, visit this group at http://groups.google.com/group/resourcespace?hl=en.

David Dwiggins

unread,
Jun 11, 2010, 6:24:07 PM6/11/10
to resour...@googlegroups.com
The concern about keeping the metadata linked to the images is valid.
That's why we had Dan write the code for the XML sidecar files when we
implemented the ResourceSpace. Now, in theory, even if the MySQL
database went away entirely, you could write a program that would walk
through the filestore and reconstruct all of the data and images by
reading the XML files and associated image files in the same folder.

And if the MySQL database is still there, it's even easier -- just a
SQL query. Plus if the accession number was embedded in the IPTC of
the orginal file, it should still be there inside the ResourceSpace
filestore.

As part of our project, we also created a new unique numbering system
for everything in our collection. So in addition to the traditional
accession number (with all its quirks), we also have a "globally
unique serial number" GUSN that is generated by the collections
system. We can also assign these to items that wouldn't have
traditionally gotten an accession number, such as a reference book or
a single item in a large archival collection. You can see this with
records like

http://bit.ly/b9dWMs (museum object)
http://bit.ly/90p7G2 (archival object)
http://bit.ly/dyxKw3 (library book)

All three have different identifiers (accession number for the museum
object, reference code/collection code for the archival object, and
call number for the book.) But all also have a GUSN. And the ones that
have images each also have at least one Resource ID associated with
them, which is what's allowing the online catalog to pull an image.

We're still pondering how much data we want to pull back from our
cataloging system into ResourceSpace to flesh out the records there.
But generally for museum objects there is at least an accession number
and a title already.

In any case, I think it would be possible to make ResourceSpace do
what you want, but it would require some programming.

It sounds like maybe it would be a two step process:

1. Make it so that the accession numbers are always indexed as a unit.
We had problems with our accession numbers initially because
ResourceSpace wanted to pull them apart into separate words at the
periods. I believe I got around this by tweaking the indexing code so
that a period without a space after it would not be considered a word
break. But for your case, it sounds like what's really required is a
special flag on the field that tells the system not to attempt to
tokenize it at all, and instead to simply index the complete value. I
don't think off the top of my head there is a way to do this built in.
But I don't think it would be all that hard to implement. Ideally it
seems like the accession number would be pulled into a separate field
so that you could control this behavior independently of anything else
that might appear in the title field.

2. Once that was in place, the search code would also have to
understand that it should search this field as a unit rather than
breaking up the words.

I think there might be some value for this for other uses as well --
such as LC-formatted names.

Let me know your thoughts...

-David Dwiggins

Tom Gleason

unread,
Jun 11, 2010, 6:34:52 PM6/11/10
to resour...@googlegroups.com
Not that this necessarily helps completely in your case, but keep in
mind this config:

# Configures separators to use when splitting keywords

# You must reindex after altering this if you have existing data in
the system (via pages/tools/reindex.php)

$config_separators=array("/","-",".","; ","(",")","\"","\\");

If you have A-1997, RS by default would split that into two keywords
since "-" is a config_separator.
If you remove "-" from the config_separators array, A-1997 would be
stored as a unit.

David Dwiggins

unread,
Jun 11, 2010, 6:49:03 PM6/11/10
to resour...@googlegroups.com
Yeah -- I spent a lot of time on this when we were first trying to get
our accession numbers to search correctly. I think I also built in a
function that would ignore periods as separators if they weren't
followed by a space. (Or something like that -- I'd have to look at
the config file.)

But it sounds like in this case what is really needed is the ability
to tell RS not to tokenize a field at all, and instead index it as a
unit. Doesn't seem like this would be too difficult to do.

It might also require a change on the search end, though, so that the
search parser doesn't try to break the accession number up into
individual words. So if searching a specific field that is configured
for complete indexing, it would not try to break up the search term.

Hmm.

-David Dwiggins

Tom Gleason

unread,
Jun 11, 2010, 6:58:07 PM6/11/10
to resour...@googlegroups.com
good idea, I think!

David Shaw

unread,
Jun 12, 2010, 5:17:45 AM6/12/10
to ResourceSpace
Tom, we've looked at Canto Cumulus - would probably be able to be
configured to do what we want but the UI was terrible and the cost was
far too high. Also Portfolio v8.5 - not bad but failed on the
permissions side of things, yet to look at v9 but I don't think
they've changed the security, and again quite expensive.

We also looked at Fedora repository but the poor documentation for
that meant that one of us would end up spending a considerable amount
of time getting to know it before we could even decide if it could do
all the things we wanted.

We kind of decided to go for an interim solution of using the OPAC to
find records whose associated images could then be found on the file
system - we looked at using windows search for this - it works well
with permissions, hiding assets that users don't have access rights
to. However it obviously has no IMS/DAM functions.

I decided to revisit ResourceSpace and was very pleased to see how
much it had improved since I last looked at it a year or two ago. I
began to think we could perhaps use it as a far better version of our
windows search idea. So last week installed it and started playing.
It's a bit of a struggle as a new user trying to figure out the way it
works as the wiki doesn't seem to have been kept up to date with new
releases. But trawling this group was a great help. It's not so
complex that once a few gotchas have been solved it would be quite
manageable within our team I feel.

As we look more into the way a DAM needs to work with our objects it's
clear that it must be driven from our collections management system -
trying to crosswalk and sync data between the two for say keywords and
description is nigh on impossible as how our Image Library wants to
describe an image is very different to how our curators want to
document an object. It's an editorial process that really isn't
possible to automate for us satisfactorily. Now the decision has to
be made if the Image Library description and keywords should be
incorporated into the collections management system so they always can
be associated with an object, with multiple images per object I can
see this working better than trying to keep such data in a DAM. Also
if we consider importing condition reports/photographs and
conservation reports into the DAM again the link is via accession
number and the data used for discovery should reside in the
collections management system.

David

David Shaw

unread,
Jun 12, 2010, 6:18:17 AM6/12/10
to ResourceSpace
David, the idea of XML sidecar files is an excellent one. As for
creating a unique serial number our collections management system
does this in a way as each record has a unique internal reference
created, we have used that in the past for things, but it can break
(as has happened) if a record is deleted and re-created or an
accession number is changed. This is why we have to rely on the
accession number as being God. We'd have to be able to guarantee that
a specific GUSN always matches a specific accession number, I can't
see a way of doing that without creating a lot of work for ourselves.

My feelings are that the search box in ResourceSpace should work like
a Google search box; most people are accustomed to this and it has
become a standard way of searching that they expect to work, like it
or not. So "+" and "-" search options as well as a Phrase search
enclosed in double quotes that is not parsed in any way are possible.
Also default AND the search terms and allow use of OR in the search
term. This will let you do searches such as:

landscape OR scenery -river
"EC.28 & A & B-1938"
Rembrandt OR Vermeer -"attributed to" -after

(http://www.google.com/support/websearch/bin/answer.py?
hl=en&answer=136861)

If that was possible then it would be very easy for other systems to
pass a URL with search terms in to ResourceSpace, it would also
improve searching within ResourceSpace.

I've often found that searching is often neglected in DAMs, this
puzzles me as I'd consider it fundamental to their function of finding
assets once they have been cataloged.

David

On Jun 11, 11:24 pm, David Dwiggins <da...@dwiggins.net> wrote:
> The concern about keeping the metadata linked to the images is valid.
> That's why we had Dan write the code for the XML sidecar files when we
> implemented the ResourceSpace. Now, in theory, even if the MySQL
> database went away entirely, you could write a program that would walk
> through the filestore and reconstruct all of the data and images by
> reading the XML files and associated image files in the same folder.
>
> And if the MySQL database is still there, it's even easier -- just a
> SQL query. Plus if the accession number was embedded in the IPTC of
> the orginal file, it should still be there inside the ResourceSpace
> filestore.
>
> As part of our project, we also created a new unique numbering system
> for everything in our collection. So in addition to the traditional
> accession number (with all its quirks), we also have a "globally
> unique serial number" GUSN that is generated by the collections
> system. We can also assign these to items that wouldn't have
> traditionally gotten an accession number, such as a reference book or
> a single item in a large archival collection. You can see this with
> records like
>
> http://bit.ly/b9dWMs(museum object)http://bit.ly/90p7G2(archival object)http://bit.ly/dyxKw3(library book)
> ...
>
> read more »

David Dwiggins

unread,
Jun 12, 2010, 2:38:27 PM6/12/10
to resour...@googlegroups.com
hmm. Re: the google style search: I think it would be possible to add
complete field indexing for certain fields to the existing search
infrastructure of RS without too much trouble. But getting
phrase-based searching to work the way you described would be a much
trickier problem to program.

One approach to this would be to integrate an existing search
application into ResourceSpace. I have been doing a lot of work with
Apache Solr for the public-facing view of our catalog. This would
provide very advanced search capabilities. But it has the drawback of
forcing the end user to install and run a second (java-based) piece of
software, making it far more difficult for beginning users.

Interestingly, there does appear to be a PHP port of Lucene in the
Zend php framework. Lucene is the search engine underlying Solr, and
the PHP port claims to be 100% compatible with the original java
implementation. I wonder if this might be a path toward being able to
use more advanced search capabilities while still holding onto the
standalone/*AMP-only architecture of ResourceSpace. Hmm.

I'm not sure of the level of effort that would be involved in this,
but it might be something to explore. Maybe I'll try to poke around
the zend lucene implementation if I have a few minutes sometime
soon...

-David Dwiggins

> For more options, visit this group at http://groups.google.com/group/resourcespace?hl=en.
>
>

David Shaw

unread,
Jun 13, 2010, 4:53:39 AM6/13/10
to ResourceSpace
Having had a quick look at the Zend_Search_Lucene site it sounds like
a very good idea indeed!

Zend_Search_Lucene supports the following features:

* Ranked searching - best results returned first
* Many powerful query types: phrase queries, boolean queries,
wildcard queries, proximity queries, range queries and many others.
* Search by specific field (e.g., title, author, contents)

David

On Jun 12, 7:38 pm, David Dwiggins <da...@dwiggins.net> wrote:
> hmm. Re: the google style search: I think it would be possible to add
> complete field indexing for certain fields to the existing search
> infrastructure of RS without too much trouble. But getting
> phrase-based searching to work the way you described would be a much
> trickier problem to program.
>
> One approach to this would be to integrate an existing search
> application into ResourceSpace.  I have been doing a lot of work with
> Apache Solr for the public-facing view of our catalog. This would
> provide very advanced search capabilities. But it has the drawback of
> forcing the end user to install and run a second (java-based) piece of
> software, making it far more difficult for beginning users.
>
> Interestingly, there does appear to be a PHP port of Lucene in the
> Zend php framework. Lucene is the search engine underlying Solr, and
> the PHP port claims to be 100% compatible with the original java
> implementation. I wonder if this might be a path toward being able to
> use more advanced search capabilities while still holding onto the
> standalone/*AMP-only architecture of ResourceSpace. Hmm.
>
> I'm not sure of the level of effort that would be involved in this,
> but it might be something to explore. Maybe I'll try to poke around
> the zend lucene implementation if I have a few minutes sometime
> soon...
>
> -David Dwiggins
>
> ...
>
> read more »
Message has been deleted

David Shaw

unread,
Jun 15, 2010, 11:28:46 AM6/15/10
to ResourceSpace
We got this working today, our PHP coder queried the ResourceSpace
MYSQL database so that it returns resource number IDs based on an
exact match to the Title field via an accession number match from
records initially found via our OPAC which then gives us a URL to
feed
to ResourceSpace that looks like:

../resourcespace/pages/search.php?search=!list300:301:302:400

Which works like a charm, I've tested it with 450 ID's in the string
and it's an almost instant response, which is excellent given the low
spec of our testing environment.

Thanks for letting us know about the !list search!

David
> > For more options, visit this group athttp://groups.google.com/group/resourcespace?hl=en.
Reply all
Reply to author
Forward
0 new messages