Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
FW: [solrmarc-tech] Generalized getTitleSort function
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Robert Haschart  
View profile  
 More options Sep 10 2012, 5:53 pm
From: Robert Haschart <rh...@virginia.edu>
Date: Mon, 10 Sep 2012 17:53:09 -0400
Local: Mon, Sep 10 2012 5:53 pm
Subject: Re: FW: [solrmarc-tech] Generalized getTitleSort function

There _was_ a good reason for that routine to be "protected"  when the
only way that it could be called was from within the SolrIndexer code,
or from within a class derived from SolrIndexer, (since derived classes
can access protected members/methods in their super classes).  However
now in light of the existence ofBeanShell routines and the existence of
IndexerMixin classes, both of which provide means of defining additional
custom indexing functionality, and given that java has no protection
level less restrictive than protected other than public, many more of
the routines in SolrIndexer should be made public, to allow access from
the BeanShell scripts and the mixins.

Additionally, I looked at the code Tod mentioned, and the answer there
is "Yes the LNK syntax is used, and yes it will continue to be
supported, and yes the line of code he highlighted is indeed a bug.

-Bob Haschart

On 9/10/2012 4:06 PM, Demian Katz wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Generalized getTitleSort function" by Tod Olson
Tod Olson  
View profile  
 More options Sep 10 2012, 8:13 pm
From: Tod Olson <olson....@gmail.com>
Date: Mon, 10 Sep 2012 19:13:44 -0500
Local: Mon, Sep 10 2012 8:13 pm
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

Thank you, this clears up some some questions and helps me know how to proceed. That all makes sense, the historical reasons for "protected," and the changes in the environment. And good to hear that "LNK" is in use and supported, I expect we''ll be using it.

And thank you for all of the work on solrmarc,

-Tod

On Sep 10, 2012, at 4:53 PM, Robert Haschart <rh...@virginia.edu> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Demian Katz  
View profile  
 More options Sep 11 2012, 8:36 am
From: Demian Katz <demian.k...@villanova.edu>
Date: Tue, 11 Sep 2012 12:36:18 +0000
Local: Tues, Sep 11 2012 8:36 am
Subject: RE: [solrmarc-tech] Generalized getTitleSort function

Are there any plans to systematically change protected methods to public?  Should we set up Tod with SVN access so he can make necessary adjustments?

- Demian

From: Tod Olson [mailto:olson....@gmail.com]
Sent: Monday, September 10, 2012 8:14 PM
To: solrmarc-tech@googlegroups.com
Cc: Tod Olson; Demian Katz; Olson Tod; Filipov Sean
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

Thank you, this clears up some some questions and helps me know how to proceed. That all makes sense, the historical reasons for "protected," and the changes in the environment. And good to hear that "LNK" is in use and supported, I expect we''ll be using it.

And thank you for all of the work on solrmarc,

-Tod

On Sep 10, 2012, at 4:53 PM, Robert Haschart <rh...@virginia.edu<mailto:rh...@virginia.edu>> wrote:

There was a good reason for that routine to be "protected"  when the only way that it could be called was from within the SolrIndexer code, or from within a class derived from SolrIndexer, (since derived classes can access protected members/methods in their super classes).  However now in light of the existence of BeanShell routines and the existence of IndexerMixin classes, both of which provide means of defining additional custom indexing functionality, and given that java has no protection level less restrictive than protected other than public, many more of the routines in SolrIndexer should be made public, to allow access from the BeanShell scripts and the mixins.

Additionally, I looked at the code Tod mentioned, and the answer there is "Yes the LNK syntax is used, and yes it will continue to be supported, and yes the line of code he highlighted is indeed a bug.

-Bob Haschart

On 9/10/2012 4:06 PM, Demian Katz wrote:
Bob,

I think this email has gone unanswered (apart from my own somewhat inadequate response).

Do you have any thoughts on this?  Particularly whether there is a particular reason that certain methods are protected - should we make more things public to increase the flexibility of customization?

thanks,
Demian

From: solrmarc-tech@googlegroups.com<mailto:solrmarc-tech@googlegroups.com> [mailto:solrmarc-tech@googlegroups.com] On Behalf Of Tod Olson
Sent: Tuesday, August 28, 2012 1:33 PM
To: solrmarc-tech@googlegroups.com<mailto:solrmarc-tech@googlegroups.com>
Cc: t...@uchicago.edu<mailto:t...@uchicago.edu>; se...@uchicago.edu<mailto:se...@uchicago.edu>
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

So now we're trying to implement a beanshell version of getFieldList, but some of the methods we need to use, like getSubfieldDataAsSet, are protected. So there's not a good way to do this in beanshell without reimplementing some of these functions. And if we try to extend SolrIndexer in our own package, we'll just run into the same problem.

It seems that getSubfieldDataAsSet is a pretty useful utility method that SolrIndex subclasses would want to use. Does it need to be protected? And can anyone suggest a workaround?

The secondary question is looking ahead towards support for non-Roman scripts. I see in getFieldList aht there is some undocumented syntax supported in getFieldList for specifying linked fields, so "LNK245ab" should indicate the 880 that corresponds to this 245. Are people using this syntax in their marc_local.properties files? It certainly makes sense that you'd want a way to specify specific 880s in the tag list.

But if this syntax is still intended to be supported, I think I've found a bug. It looks like this line:

    String subfield = tags[i].substring(3);

will erroneously set subfield to "245ab" rather than just "ab".

If this LNK syntax is considered supported, are there constraints on how it is supposed to work? Or is it better to rely on the getLinkedField* methods?

Anyhow, I'd appreciate any suggestions for working around this business of the protected static getSubfieldDataAsSet. And also any forward-looking suggestions for bringing in the non-Roman 880 data.

Best,

-Tod

On Tuesday, August 28, 2012 7:46:41 AM UTC-5, Demian Katz wrote:

Hopefully Bob Haschart will chime in on this - he's the main architect of SolrMarc, and his opinion is much more informed than mine!  However, I think this does sound like a sensible solution.  As you say, it would be great if you could take the functional programming approach and actually pass in the processing routine you want to use.  That's not so easy in Java, but maybe the next best thing would to be define an interface (i.e. MarcFilterInterface) and pass in an object that instantiates the interface.  Then you just use the object to process the matches.  Obviously, the big limitation here is that you can't pass arbitrary objects in from the SolrMarc configuration files...  but at least this would allow more flexibility under the hood, and you could create wrapper functions as needed to be called from the configs.

- Demian

From: solrma...@googlegroups.com<javascript:> [mailto:solrma...@googlegroups.com<javascript:>] On Behalf Of Tod Olson
Sent: Tuesday, August 28, 2012 8:39 AM
To: solrma...@googlegroups.com<javascript:>
Cc: t...@uchicago.edu<javascript:>; se...@uchicago.edu<javascript:>
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

Thanks, that helps. After mulling it over for a bit, here's what I'm thinking.

A beanshell version of getFieldList that takes an extra parameter to signal normalizing:

  Set<string> getFieldList(String taglist, String normalizer)

normalizer tells getFieldList whether or what kind of normalization to apply in it's inner loop, after it has extracted the desired subfields from the tag spec. If normalizer=null, then it's just like the current getFieldList.

Maybe normalizer starts with a generic "sort" value, which trims non-sorting chars, downcases, and strips punctuation. Or maybe there's a "title-sort" value which knows about the non-sorting indicator, and a plain "sort" that doesn't include that logic. I'm not certain yet.

Ideally, normalizer would be the name of a function/method to call to act as the normalizer. That would be the most general. But I don't think Java/beanshell is so friendly to that kind of approach.

Anyhow, that's what I'm thinking of. Any reactions?

-Tod

On Monday, August 27, 2012 9:34:03 AM UTC-5, Demian Katz wrote:

1.)    I'm not aware of a method that meets your need.  Perhaps it would make sense to refactor the existing Set<String> getFieldList to wrap around a separate method that returns a Set<Field>, but this does not look like it would be an entirely straightforward task, so it may be impractical.  Maybe Bob has a better idea.
2.)    BeanShell definitely causes a performance hit, but I don't think it's terribly significant.  It certainly doesn't hurt to prototype in BeanShell.  If the performance is good enough, then you're done; if you have problems, it's not difficult to adapt it to pure Java and compile it in.
3.)    A method that takes a fieldspec and returns an array of title sort keys sounds pretty generalized to me.  My only question is whether this could be combined with the existing getSortableTitle in some way to avoid redundant logic (i.e. have one work as a special case of the other).

- Demian

From: solrma...@googlegroups.com<mailto:solrma...@googlegroups.com> [mailto:solrma...@googlegroups.com] On Behalf Of Tod Olson
Sent: Saturday, August 25, 2012 5:22 PM
To: solrma...@googlegroups.com<mailto:solrma...@googlegroups.com>
Cc: t...@uchicago.edu<mailto:t...@uchicago.edu>; se...@uchicago.edu<mailto:se...@uchicago.edu>
Subject: [solrmarc-tech] Generalized getTitleSort function

I'm looking to implement a generalized getTitleSortKeys method or function. I want to implement a bean shell function that will take a tag string and return the text of the requested fields and subfields with the non-filing bits removed and with some other normalization for computing sort keys. And I have a couple questions about tackling this.

The context is that we are working with the VuFind title browse, but extending it so a variety of title fields in a record can show up in the title browse list. This means parallel fields in Solr, a title_browse with all of the display versions, and a title_browse_keys with the same data normalized for sorting into a browse list. (these then get pulled dumped into a relational table to provide the actual index). So we'll have a couple lines like this in the properties file (yes, we like to pull from everywhere):

title_browse = 210ab:211a:212a:214a:240:242abchnp:245abcdefghknps:246abfghnp:247abfghnp:49 0av:740ahnp:780bcst:785bcst:787bcst:840ahv:844a
title_browse_keys = script(ucGetTitleBrowseKeys.bsh), getTitleBrowseKeys(210ab:211a:212a:214a:240:242abchnp:245abcdefghknps:246ab fghnp:247abfghnp:490av:740ahnp:780bcst:785bcst:787bcst:840ahv:844a)

Here are my questions:

1. Does this or some useful building block already exist? I think I'll have to reimplement the iteration over the tag string and handling the subfields. I've not really found something that would take the tag string and return Set<Field>. I have found methods that return a Set<String>, which does not give the non-filing information. If I'm overlooking some useful helper functions, a pointer would be welcome.

2. Are there any worries about implementing this in bean shell? Have people found bean shell to be a significant performance hit during indexing?

3. I'd like this to be general enough to be of use to others. If there's something about the function as described that could be generalized or broken out as a utility to be of more use, please let me know.

-Tod

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To view this discussion on the web visit https://groups.google.com/d/msg/solrmarc-tech/-/_qSRKdgdsvoJ.
To post to this group, send email to ...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tod Olson  
View profile  
 More options Sep 11 2012, 12:49 pm
From: Tod Olson <olson....@gmail.com>
Date: Tue, 11 Sep 2012 11:49:52 -0500
Local: Tues, Sep 11 2012 12:49 pm
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

Alternately, I'd be happy to submit a couple patches in the issue tracker on Google Code.

-Tod

On Sep 11, 2012, at 7:36 AM, Demian Katz <demian.k...@villanova.edu> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Haschart  
View profile  
 More options Sep 11 2012, 2:16 pm
From: Robert Haschart <rh...@virginia.edu>
Date: Tue, 11 Sep 2012 14:16:28 -0400
Local: Tues, Sep 11 2012 2:16 pm
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

I have added   olsen....@gmail.com   as a committer to SolrMarc.

-Bob

On 9/11/2012 12:49 PM, Tod Olson wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tod Olson  
View profile  
 More options Sep 12 2012, 3:27 pm
From: Tod Olson <olson....@gmail.com>
Date: Wed, 12 Sep 2012 14:27:12 -0500
Local: Wed, Sep 12 2012 3:27 pm
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

I've changed SolrIndexer: isControlField() and both versions of SolrIndexer:getSubfieldDataAsSet() to "public static." Those seem to be the only static methods of SolrIndexer to change, and they work well. Unfortunately, when I do "svn ci" my credentials are rejected.

Patch attached.

-Tod

On Sep 11, 2012, at 1:16 PM, Robert Haschart <rh...@virginia.edu> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Demian Katz  
View profile  
 More options Sep 12 2012, 3:43 pm
From: Demian Katz <demian.k...@villanova.edu>
Date: Wed, 12 Sep 2012 19:43:07 +0000
Local: Wed, Sep 12 2012 3:43 pm
Subject: RE: [solrmarc-tech] Generalized getTitleSort function

Have you worked with Google Code before?  If memory serves, your normal Google credentials won't work - you need to use different Google-generated credentials for checking in code.  See:

http://code.google.com/p/support/wiki/SubversionFAQ#Where_do_I_get_a_...

If that's not the problem, please let us know!

- Demian

From: solrmarc-tech@googlegroups.com [mailto:solrmarc-tech@googlegroups.com] On Behalf Of Tod Olson
Sent: Wednesday, September 12, 2012 3:27 PM
To: solrmarc-tech@googlegroups.com
Cc: Tod Olson
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

I've changed SolrIndexer: isControlField() and both versions of SolrIndexer:getSubfieldDataAsSet() to "public static." Those seem to be the only static methods of SolrIndexer to change, and they work well. Unfortunately, when I do "svn ci" my credentials are rejected.

Patch attached.

-Tod


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tod Olson  
View profile  
 More options Sep 12 2012, 3:46 pm
From: Tod Olson <olson....@gmail.com>
Date: Wed, 12 Sep 2012 14:46:16 -0500
Local: Wed, Sep 12 2012 3:46 pm
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

Right you are!  Revision 1657 has been committed.

-Tod

On Sep 12, 2012, at 2:43 PM, Demian Katz <demian.k...@villanova.edu> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Haschart  
View profile  
 More options Sep 12 2012, 4:18 pm
From: Robert Haschart <rh...@virginia.edu>
Date: Wed, 12 Sep 2012 16:18:24 -0400
Local: Wed, Sep 12 2012 4:18 pm
Subject: Re: [solrmarc-tech] Generalized getTitleSort function

Thank you to both Tod and Demian.

On 9/12/2012 3:46 PM, Tod Olson wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »