RE: [VuFind-Tech] How to ignore records completely based on field

30 views
Skip to first unread message

Demian Katz

unread,
Feb 10, 2012, 4:40:39 PM2/10/12
to John Wynstra, VuFind Tech, solrma...@googlegroups.com
I believe SolrMarc comes with some filtering tools that might be of use, but I haven't used them much, so I can't comment on the details -- I'm copying this message to solrmarc-tech since I suspect Bob can fill you in on the details fairly easily.

Another option would be to control this at the export side of things, but I don't know enough about Innovative to comment on how to do that...

And a third option would be to index everything and then write a routine to delete the unwanted locations after the fact (you could even set up your importer to map unwanted locations to "delete" and then run a routine to delete everything matching location:delete).

And yeah, I'm really sorry I couldn't make it to Code4lib this year -- it's something I look forward to all year, and I was really excited about seeing Seattle. A really bad time to get sick -- but I don't entirely regret staying home; I'd have felt really bad if I infected the whole VuFind community with my evil virus. ;-)

- Demian
________________________________________
From: John Wynstra [john.w...@uni.edu]
Sent: Friday, February 10, 2012 3:40 PM
To: VuFind Tech
Subject: [VuFind-Tech] How to ignore records completely based on field

So I have an Innovative Catalog with records from multiple institutions (using scopes). I am wondering how to index(import) only records that are from select locations. The end result would be that no part of a record that does not pass the location test is indexed.

I can run them through marcedit first and extract the records that I want to load, but I would prefer to bypass this step as part of an ongoing routine.

PS - Missed you at code4lib Demian. Was looking forward to a vufind breakout.

--
<><><><><><><><><><><><><><><><><><><>
John Wynstra
Library Information Systems Specialist
Rod Library
University of Northern Iowa
Cedar Falls, IA 50613
wyn...@uni.edu<mailto:wyn...@uni.edu>
(319)273-6399<tel:%28319%29273-6399>
<><><><><><><><><><><><><><><><><><><>

Alan Rykhus

unread,
Feb 13, 2012, 9:42:39 AM2/13/12
to solrma...@googlegroups.com, VuFind Tech
Hello John,

We run an installation for a III site. They have records that are in
locations they do not want included in the Solr database too. I wrote a
custom indexing script to get the ID number. While the ID is just the
001 field, what this script does is look at the locations in the 907 $a
and 998 $e. If the location is one that you do not want to include in
the load, you return null for the ID number. The record gets rejected
because the ID number is a required field.

Not pretty, but it accomplishes the task.

al

--
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities
(507)389-1975
alan....@mnsu.edu
"It's hard to lead a cavalry charge if you think you look funny on a
horse" ~ Adlai Stevenson

Alan Rykhus

unread,
Feb 13, 2012, 10:43:11 AM2/13/12
to solrma...@googlegroups.com, VuFind Tech
Hello John,

I guess the ID is in the 907. I run a couple of different installations
and forgot. We look at just the 998 $e to see whether to suppress the
record.

The following is the beanshell script I use.

You do end up with line like this in your import log:

Error: Problem invoking getDoc in SolrCoreProxy
ERROR [main] (MarcImporter.java:330) - Document [id=null] missing
required fields: id at record count = 63
ERROR [main] (MarcImporter.java:331) - Control Number ocm00418885

al

zed@dino:~$ cat /srv/vufind/import/index_scripts/getBridgeID.bsh
import org.marc4j.marc.Record;
import org.marc4j.marc.DataField;

/**
* Get the record ID from the 907 $a
* Also look at the 998 $e
* If it is:
* n - suppressed - return null for the ID
* w - future suppressed - return null for the ID
* @param Record record
* @return String ID
*/
public String getID(Record record) {
List fields;
Iterator fieldsIter;
fields = record.getVariableFields("998");
fieldsIter = fields.iterator();
if (fields != null) {
DataField nintyeight;
while (fieldsIter.hasNext()) {
nintyeight = (DataField) fieldsIter.next();
List subfields = nintyeight.getSubfields('e');
Iterator subfieldsIter = subfields.iterator();
if (subfields != null) {
String suppressed;
while (subfieldsIter.hasNext()) {
suppressed = subfieldsIter.next().getData().toLowerCase();
if (suppressed.contentEquals("n") ||
suppressed.contentEquals("w")) {
return null;
}
}
}
}
}
fields = record.getVariableFields("907");
fieldsIter = fields.iterator();
if (fields != null) {
DataField zeroseven;
while (fieldsIter.hasNext()) {
zeroseven = (DataField) fieldsIter.next();
List subfields = zeroseven.getSubfields('a');
Iterator subfieldsIter = subfields.iterator();
if (subfields != null) {
String id;
while (subfieldsIter.hasNext()) {
id = subfieldsIter.next().getData().toLowerCase();
return id;
}
}
}
}
return null;
}

On Mon, 2012-02-13 at 09:30 -0600, John Wynstra wrote:
> Alan,
>
>
> I would be interested in seeing your approach assuming you are willing
> to share.

> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft
> developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5,
> CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you
> subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Vufind-tech mailing list
> Vufin...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/vufind-tech


>
>
>
>
>
> --
> <><><><><><><><><><><><><><><><><><><>
> John Wynstra
> Library Information Systems Specialist
> Rod Library
> University of Northern Iowa
> Cedar Falls, IA 50613
> wyn...@uni.edu

> (319)273-6399

Robert Haschart

unread,
Feb 13, 2012, 11:33:20 AM2/13/12
to solrma...@googlegroups.com
Another way you can accomplish this is via the DeleteRecordIfFieldEmpty  directive. 
It can be used either through a custom routine or through a simple field spec.  


shadowed_location_facet = customDeleteRecordIfFieldEmpty, getShadowedLocation(shadowed_location_map.properties)


This calls the custom routine getShadowedLocation passing in the name of a properties map file, and if the result after collecting the values and processing them through that map ends up being an empty set, not only would the record being processed not be added to the index, but if a record exists with the same id, that record would be deleted.  (Which is important in the case where you are doing incremental updates to an existing index.)

author_text = 100abcdeq4:110abcde4:111acdejnq4:LNK100abcdeq4:LNK110abcde4:LNK111acdejnq4, DeleteRecordIfFieldEmpty

This looks in several fields for an author, and if none of them have a value, then delete the record.

In looking around, this feature doesn't seem to have been documented anywhere.  I apologize.

-Bob Haschart

John Wynstra

unread,
Feb 13, 2012, 11:45:21 AM2/13/12
to solrma...@googlegroups.com, VuFind Tech
Thanks for this Alan.  Between your input and Robert's, I should be able to accomplish what I need.  

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.

John Wynstra

unread,
Feb 13, 2012, 11:43:13 AM2/13/12
to solrma...@googlegroups.com
Thanks for all the help.  This looks like what I was hoping for.  

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.

--
<><><><><><><><><><><><><><><><><><><>
John Wynstra
Library Information Systems Specialist
Rod Library
University of Northern Iowa
Cedar Falls, IA  50613
wyn...@uni.edu
(319)273-6399
<><><><><><><><><><><><><><><><><><><> 

Reply all
Reply to author
Forward
0 new messages