--
You received this message because you are subscribed to the Google Groups "GITenberg Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gitenberg-proj...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gitenberg-project/9c7d4a8e-b213-41c2-ae11-9a41600285b7%40googlegroups.com.
The Library of Congress Subject Headings (LCSH) are very unfortunately, not available to us :-(. Fighting for this information to be free and available might be worthwhile, I don't understand all of the details and ramifications yet.
As part of the process of developing metadata for GITenberg, I've been working on understanding the metadata dumps provided by Project Gutenberg. Here's my first writeup, presented as a Gist. Comments and correction are welcome, as always.https://gist.github.com/eshellman/40d85be01acf1172a5c1
On Mar 24, 2015, at 2:03 PM, Tom Morris <tfmo...@gmail.com> wrote:- PG has multiple editions of many works and often the later ones are of higher quality than the older "editionless" editions, yet the earlier ones get downloaded way more. Enhancing the bibliographic information to help with this issue would be useful to readers. For example, this early editionless Pride & Prejudice http://www.gutenberg.org/ebooks/1342#bibrec is downloaded over 30 times more often than this later high quality transcription of the 1932 R.W. Chapman edition http://www.gutenberg.org/ebooks/42671#bibrec donated by Distributed Proofreaders. #76 & #32325 are another example. It would be good to be able to link the various editions together.- Much better provenance, including links to DP projects, scanned source files, Internet Archive mirrors, etc would be useful metadata to add
- I'm prettty sure that the last time I investigated, there were a number of duplicate entries for authors, despite the nominally canonical IDs
--
You received this message because you are subscribed to the Google Groups "GITenberg Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gitenberg-proj...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gitenberg-project/CAE9vqEH-jG_gdXo0Ka0wOpSLkJmW9cGYnGuARixJ%3DmZQDRh24Q%40mail.gmail.com.
I've copied this into the comments for part 2 of my metadata gists. https://gist.github.com/eshellman/7a6d34c88e797b439938Tom, do you think it's feasible or wise to delegate all of the gitenberg "agent" metadata to VIAF and/or ORCID and/or Wikipedia?
The other alternatives I can think of are1. keep the agent metadata associated with book repos, and link to PG agents2. create a separate repo for PG agent metadata3. defer the issue till later
As part of the process of developing metadata for GITenberg, I've been working on understanding the metadata dumps provided by Project Gutenberg. Here's my first writeup, presented as a Gist. Comments and correction are welcome, as always.https://gist.github.com/eshellman/40d85be01acf1172a5c1
On Mar 25, 2015, at 12:58 PM, Tom Morris <tfmo...@gmail.com> wrote:On Mon, Mar 23, 2015 at 8:59 AM, Eric Hellman <er...@hellman.net> wrote:As part of the process of developing metadata for GITenberg, I've been working on understanding the metadata dumps provided by Project Gutenberg. Here's my first writeup, presented as a Gist. Comments and correction are welcome, as always.https://gist.github.com/eshellman/40d85be01acf1172a5c1Are the various gists going to be distilled and pulled together someplace eventually? The piecewise, stream-of-consciousness thing is fine for the construction process, but have a single document, with the ability to comment inline or nearby would make things a lot easier.
- Wikidata is probably a better source of info than Wikipedia infoboxes although both are usually mostly about works, not editions, despite including such edition info as ISBN
- Not specifically metadata related, but the Wikipedia article shows a problem that Gitenberg will have in getting its editions to show up. It links back to PG as the most authoritative source, so readers won't see the improved version.
- The first few of the supposed LCSH terms that I sampled didn't resolve, so I'd be suspicious of them. I'm not sure whether they're out of date of just wrong. One term, Magna Carta, appears in the LC NA file currently, not LCSH. Another resolved with it's last precoordinated term dropped, but not with all three terms as presented.
- the marc901s look like links to cover images
- some of the things that are included in Freebase and specialty databases like ISFDB would make things richer and better connected (links to series, prequels, sequels, adaptations, etc)
- Freebase has a list of 11 editions including LCCNs, links to OpenLibrary, ISFDB, university OPACs, Google Books, etc which could be exploited to enrich things
Comments inlineOn Mar 25, 2015, at 12:58 PM, Tom Morris <tfmo...@gmail.com> wrote:Are the various gists going to be distilled and pulled together someplace eventually? The piecewise, stream-of-consciousness thing is fine for the construction process, but have a single document, with the ability to comment inline or nearby would make things a lot easier.I was thinking (midstream) it should become its own repo, and then maybe an ebook. I'm busy working on some non gitenberg stuff till next week, but next section, I'll do that.
- Wikidata is probably a better source of info than Wikipedia infoboxes although both are usually mostly about works, not editions, despite including such edition info as ISBNMaybe you can help here. Wikidata says it gets isbn and oclcnum from English Wikipedia, but those things aren't there. Wikipedia has series information, but that isn't in Wikidata. So what's the relationship?
- The first few of the supposed LCSH terms that I sampled didn't resolve, so I'd be suspicious of them. I'm not sure whether they're out of date of just wrong. One term, Magna Carta, appears in the LC NA file currently, not LCSH. Another resolved with it's last precoordinated term dropped, but not with all three terms as presented.Could you explain how and what you resolved? Magna Carta???
- the marc901s look like links to cover imagesDoes anyone know if this is customary in library cataloging?
LibraryThing has 28 editionsFreebase has a list of 11 editions including LCCNs, links to OpenLibrary, ISFDB, university OPACs, Google Books, etc which could be exploited to enrich things