- Demian
________________________________________
From: Walker, David [dwa...@calstate.edu]
Sent: Tuesday, August 16, 2011 12:41 PM
To: VuFind Tech
Subject: [VuFind-Tech] Proposed changes to VuFindIndexer.getFormat, Part 2
Okay, here is my proposed changes.
https://gist.github.com/1149060
It's quite lengthy, but here is why I think it addresses the flaws I mentioned in the previous email.
1. It is much more complete.
As you can see in the enumerations at the top of the file, there are many more format designations than in the current getFormat method.
This goes essentially two levels deep for material types and content types. MARC, amazingly, has even more detailed formats, at least for some material types. For example, you can divide books into 'encyclopedias', 'dictionaries', 'yearbooks', and so on. And music is even more detailed. But I had to stop somewhere, so I didn't parse these out. It also picks up secondary content types (from the 006).
2. It distinguishes between content type and media/carrier type.
There is a getMediaTypes method that essentially parses the 007, plus a few other fields. The getContentTypes method does the same for the leader/008/006 and a few data fields. Each returns all available values.
If I've done my job well, then these two 'lower-level' functions should not need to be customized by libraries. (If there is something missing or wrong here, it should be corrected in the distro.)
Instead, my thought here is that you could have 'higher-level' functions, or BeanShell scripts, that utilize, combine, or otherwise customize these values for the actual indexing.
The getPrimaryContentTypePlusOnline method is an example of this. It takes just the first content type from the getContentTypes set, and then also checks if the item is online. It then combines the content type 'Book' and the media type 'Online' into a single combined type called 'EBook'.
But this is just one of many different examples of how you might do this. I think this allows for a great deal of flexibility without people having to localize or re-write a ton of code.
3. It makes a best guess attempt at determining if this is an online resource.
The SolrMarc indexer already includes a getFullTextUrls method that checks to see if the record has a full-text link. It is a 'best guess' since many MARC records infamously contain links to table of contents and other information *about* the item without always consistently marking them as such.
But, all things considered, I think this is much preferable to the current indexer which makes no such effort, at least for format.
So there it is, at least as far as the basic issues are concerned.
There are, IMO, some other (minor) improvements over the current getFormat method.
One of the other complaints I have with the current getFormat function is that it could be much better commented. I essentially took the MARC standards documentation at loc.gov and cut-and-paste the relevant portions into my file, and then wrote the code around that. So hopefully it's well commented in a way that corresponds with the documentation online.
Also, following the BlackLight indexer, I used Enum's for the format values rather than just strings. That way, I didn't accidentally typo one of the formats.
Comments, criticisms, questions all welcome.
Except from Demian. Go get some sleep first. ;-)
--Dave
==================
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Vufind-tech mailing list
Vufin...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-tech
In the past I've submitted a getFormat function that I use in our MnPALS
Plus site. I would like to put forth an argument for that.
I use a multi-value format field. Each bib record can have multiple
formats that are in a hierarchical list from specific to generic. I
think this is a better solution.
If a patron is searching the database and they are looking for
information on whatever subject, they can limit to an ebook, but they
can also limit to just a book and have the ebooks included. Even though
they might have the latest ebook reader, they can probably still pick up
a book and read it. So they would only want to limit to book. By having
just the specific facets once you select ebook you just eliminated all
of the books in the stacks of the libraries.
The patron can still limit to ebook and get just the electronic
versions.
al
--
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities
(507)389-1975
alan....@mnsu.edu
"It's hard to lead a cavalry charge if you think you look funny on a
horse" ~ Adlai Stevenson
Of course, I haven't looked too closely at any code yet, so perhaps your indexer has some feature that David's lacks -- if this is the case, I'm sure he would be interested to hear about it.
- Demian
________________________________________
From: solrma...@googlegroups.com [solrma...@googlegroups.com] On Behalf Of Alan Rykhus [alan....@mnsu.edu]
Sent: Monday, August 29, 2011 9:45 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Better format determination (part 2 of 2)
Hello,
al
--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.
One of the goals I had with this proposal was precisely to make the determination of format as flexible as possible.
Rather than having a single, monolithic getFormat method, my proposal includes several methods at two different levels. The bulk of the code consists of two 'lower-level' functions that parse format in terms of content type and media/carrier type (see the previous emails for a discussion of this). Higher level functions, or BeanShell scripts, can then call these two lower-level functions and combine, contract, or otherwise customize the results as they see fit.
In that way, each institution can do this slightly differently, while keeping the bulk of the code untouched.
> they can limit to an ebook, but they
> can also limit to just a book and have
> the ebooks included.
I complete agree with you on this. The getPrimaryContentTypePlusOnline method (line 205) is an example of one of these 'higher-level' functions. It specifically addresses this scenario.
So you can have multiple format values, or just a primary one, or a combination of the two, or even different facets for content type and one for media/carrier type. The options here are myriad.
--Dave
==================
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu
________________________________________
From: solrma...@googlegroups.com [solrma...@googlegroups.com] On Behalf Of Alan Rykhus [alan....@mnsu.edu]
Sent: Monday, August 29, 2011 6:45 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Better format determination (part 2 of 2)
Hello,
al
--
Hi Alan,One of the goals I had with this proposal was precisely to make the determination of format as flexible as possible.
Rather than having a single, monolithic getFormat method, my proposal includes several methods at two different levels. Â The bulk of the code consists of two 'lower-level' functions that parse format in terms of content type and media/carrier type (see the previous emails for a discussion of this). Â Higher level functions, or BeanShell scripts, can then call these two lower-level functions and combine, contract, or otherwise customize the results as they see fit.
In that way, each institution can do this slightly differently, while keeping the bulk of the code untouched.
> they can limit to an ebook, but they
> can also limit to just a book and have> the ebooks included.
I complete agree with you on this. Â The getPrimaryContentTypePlusOnline method (line 205) is an example of one of these 'higher-level' functions. Â It specifically addresses this scenario.
So you can have multiple format values, or just a primary one, or a combination of the two, or even different facets for content type and one for media/carrier type. Â The options here are myriad.
--Dave
==================
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu
________________________________________
From: solrma...@googlegroups.com [solrmarc-tech@googlegroups.com] On Behalf Of Alan Rykhus [alan....@mnsu.edu]
To unsubscribe from this group, send email to solrmarc-tech+unsubscribe@googlegroups.com.
https://groups.google.com/group/solrmarc-tech/msg/d6875e5b11fe8d61?hl=en
(As a side note, I had a lot of trouble figuring out how to link to this specific message -- maybe it would be better if we copied release notes into the SolrMarc wiki for easier future reference).
I don't think this has been widely tested yet, so there may be some issues depending on the nature of your MARC records. For further discussion, see this thread:
https://groups.google.com/forum/#!msg/solrmarc-tech/JNINy1crNug/md1fjyQ6SzAJ
- Demian
________________________________________
From: solrma...@googlegroups.com [solrma...@googlegroups.com] On Behalf Of Owen Stephens [ow...@ostephens.com]
Sent: Friday, March 02, 2012 4:34 AM
To: solrma...@googlegroups.com
Subject: Re: [solrmarc-tech] Better format determination (part 2 of 2)
Where did this proposal get to? Was it ever integrated into SolrMARC?
Thanks,
Owen
On Monday, August 29, 2011 5:07:04 PM UTC+1, David Walker wrote:
Hi Alan,
One of the goals I had with this proposal was precisely to make the determination of format as flexible as possible.
Rather than having a single, monolithic getFormat method, my proposal includes several methods at two different levels. The bulk of the code consists of two 'lower-level' functions that parse format in terms of content type and media/carrier type (see the previous emails for a discussion of this). Higher level functions, or BeanShell scripts, can then call these two lower-level functions and combine, contract, or otherwise customize the results as they see fit.
In that way, each institution can do this slightly differently, while keeping the bulk of the code untouched.
> they can limit to an ebook, but they
> can also limit to just a book and have
> the ebooks included.
I complete agree with you on this. The getPrimaryContentTypePlusOnline method (line 205) is an example of one of these 'higher-level' functions. It specifically addresses this scenario.
So you can have multiple format values, or just a primary one, or a combination of the two, or even different facets for content type and one for media/carrier type. The options here are myriad.
--Dave
==================
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu
________________________________________
From: solrma...@googlegroups.com<mailto:solrma...@googlegroups.com> [solrma...@googlegroups.com<mailto:solrma...@googlegroups.com>] On Behalf Of Alan Rykhus [alan....@mnsu.edu<mailto:alan....@mnsu.edu>]
Sent: Monday, August 29, 2011 6:45 AM
To: solrma...@googlegroups.com<mailto:solrma...@googlegroups.com>
Subject: Re: [solrmarc-tech] Better format determination (part 2 of 2)
Hello,
In the past I've submitted a getFormat function that I use in our MnPALS
Plus site. I would like to put forth an argument for that.
I use a multi-value format field. Each bib record can have multiple
formats that are in a hierarchical list from specific to generic. I
think this is a better solution.
If a patron is searching the database and they are looking for
information on whatever subject, they can limit to an ebook, but they
can also limit to just a book and have the ebooks included. Even though
they might have the latest ebook reader, they can probably still pick up
a book and read it. So they would only want to limit to book. By having
just the specific facets once you select ebook you just eliminated all
of the books in the stacks of the libraries.
The patron can still limit to ebook and get just the electronic
versions.
al
On Mon, 2011-08-29 at 08:44 -0400, Demian Katz wrote:
> ...and here's the rest of the discussion.
>
> - Demian
> ________________________________________
> From: Walker, David [dwa...@calstate.edu<mailto:dwa...@calstate.edu>]
> Sent: Tuesday, August 16, 2011 12:41 PM
> To: VuFind Tech
> Subject: [VuFind-Tech] Proposed changes to VuFindIndexer.getFormat, Part 2
>
> Okay, here is my proposed changes.
>
> https://gist.github.com/1149060
>
> It's quite lengthy, but here is why I think it addresses the flaws I mentioned in the previous email.
>
> 1. It is much more complete.
>
> As you can see in the enumerations at the top of the file, there are many more format designations than in the current getFormat method.
>
>
>
> This goes essentially two levels deep for material types and content types. MARC, amazingly, has even more detailed formats, at least for some material types. For example, you can divide books into 'encyclopedias', 'dictionaries', 'yearbooks', and so on. And music is even more detailed. But I had to stop somewhere, so I didn't parse these out. It also picks up secondary content types (from the 006).
>
> 2. It distinguishes between content type and media/carrier type.
>
> There is a getMediaTypes method that essentially parses the 007, plus a few other fields. The getContentTypes method does the same for the leader/008/006 and a few data fields. Each returns all available values.
>
> If I've done my job well, then these two 'lower-level' functions should not need to be customized by libraries. (If there is something missing or wrong here, it should be corrected in the distro.)
>
> Instead, my thought here is that you could have 'higher-level' functions, or BeanShell scripts, that utilize, combine, or otherwise customize these values for the actual indexing.
>
> The getPrimaryContentTypePlusOnline method is an example of this. It takes just the first content type from the getContentTypes set, and then also checks if the item is online. It then combines the content type 'Book' and the media type 'Online' into a single combined type called 'EBook'.
>
> But this is just one of many different examples of how you might do this. I think this allows for a great deal of flexibility without people having to localize or re-write a ton of code.
>
> 3. It makes a best guess attempt at determining if this is an online resource.
>
> The SolrMarc indexer already includes a getFullTextUrls method that checks to see if the record has a full-text link. It is a 'best guess' since many MARC records infamously contain links to table of contents and other information *about* the item without always consistently marking them as such.
>
> But, all things considered, I think this is much preferable to the current indexer which makes no such effort, at least for format.
>
> So there it is, at least as far as the basic issues are concerned.
>
> There are, IMO, some other (minor) improvements over the current getFormat method.
>
>
>
> One of the other complaints I have with the current getFormat function is that it could be much better commented. I essentially took the MARC standards documentation at loc.gov<http://loc.gov> and cut-and-paste the relevant portions into my file, and then wrote the code around that. So hopefully it's well commented in a way that corresponds with the documentation online.
>
> Also, following the BlackLight indexer, I used Enum's for the format values rather than just strings. That way, I didn't accidentally typo one of the formats.
>
>
>
> Comments, criticisms, questions all welcome.
>
>
>
> Except from Demian. Go get some sleep first. ;-)
>
>
>
> --Dave
>
> ==================
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
>
> ------------------------------------------------------------------------------
> uberSVN's rich system and user administration capabilities and model
> configuration take the hassle out of deploying and managing Subversion and
> the tools developers use with it. Learn more about uberSVN and get a free
> download at: http://p.sf.net/sfu/wandisco-dev2dev
> _______________________________________________
> Vufind-tech mailing list
> Vufin...@lists.sourceforge.net<mailto:Vufin...@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/vufind-tech
>
--
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities
(507)389-1975
alan....@mnsu.edu<mailto:alan....@mnsu.edu>
"It's hard to lead a cavalry charge if you think you look funny on a
horse" ~ Adlai Stevenson
--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To post to this group, send email to solrma...@googlegroups.com<mailto:solrma...@googlegroups.com>.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com<mailto:solrmarc-tech%2Bunsu...@googlegroups.com>.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.
--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To view this discussion on the web visit https://groups.google.com/d/msg/solrmarc-tech/-/2HHqJH5RasgJ.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
We're only using the 'content type' designation (plus electronic), however. And so the 'media type' designation is not as well tested, per the thread below.
--Dave
-----------------
David Walker
Library Web Services Manager
California State University
https://groups.google.com/group/solrmarc-tech/msg/d6875e5b11fe8d61?hl=en
https://groups.google.com/forum/#!msg/solrmarc-tech/JNINy1crNug/md1fjyQ6SzAJ
Thanks,
Owen
> the ebooks included.
--Dave
Hello,
al
> -------- uberSVN's rich system and user administration capabilities
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: ow...@ostephens.com
Telephone: 0121 288 6936