Papers Past - are some titles missing?

14 views
Skip to first unread message

Conal Tuohy

unread,
Aug 24, 2014, 8:11:13 AM8/24/14
to digi...@googlegroups.com
I have been querying DigitalNZ to find the list of Papers Past titles (i.e. the different newspapers).

I did this by searching for text "*:*", but asking for 0 actual results, and just asking for the values of the "collection" facet.

http://api.digitalnz.org/v3/records.xml?text=*%3A*&and[content_partner]=National+Library+of+New+Zealand&and[primary_collection]=Papers+Past&per_page=0&facets=collection&facets_per_page=100&api_key=INSERT_KEY

There are 68 papers returned, but some of the papers listed on the Papers Past website are not in the list.

I went through the Papers Past title list to identify which appeared to be missing from Digital NZ:
  • Bay of Plenty Beacon
  • Charleston Argus
  • Dominion
  • Free Lance
  • Horowhenua Chronicle
  • Hutt Valley Independent
  • King Country Chronicle
  • Lake Wakatip Mail
  • Lyell Times and Central Buller Gazette
  • Maoriland Worker
  • Mount Ida Chronicle
  • Mt Benger Mail
  • New Zealand Herald
  • Oamaru Mail
  • Press
  • Pukekohe & Waiuku Times
  • Sun
  • Te Puke Times
  • Thames Advertiser
  • Thames Star
  • Upper Hutt Weekly Review
  • Victoria Times
  • Waihi Daily Telegraph
  • Westport Times
Any thoughts? Am I guerying incorrectly? Or are these genuinely missing from Digital NZ?

I also wondered if these missing records had been added to Papers Past recently, and not yet syndicated to DigitalNZ, so I queried for the record with the latest syndication_date of Papers Past items in Digital NZ, using http://api.digitalnz.org/v3/records.xml?and[content_partner]=National+Library+of+New+Zealand&and[primary_collection]=Papers+Past&per_page=1&direction=desc&sort=syndication_date&api_key=INSERT_KEY, and found it was dated 2014-07-15T08:33:34+12:00, which is not that long ago.

Andy Neale

unread,
Aug 24, 2014, 4:45:05 PM8/24/14
to digi...@googlegroups.com

Hi Conal,

 

Yep, you are right, we currently can't index all of Papers Past. There are various reasons for that, but mostly because the Papers Past system doesn't provide easy access to the data. There is a major upgrade coming that will bring us closer to real-time access to its content updates, but I am afraid we are probably 12 months away from realising that and upgrading our own storage to accommodate the full data set.

 

Cheers,

Andy

--

---
You received this message because you are subscribed to the Google Groups "DigitalNZ" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalnz+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Conal Tuohy

unread,
Sep 11, 2014, 4:05:05 AM9/11/14
to digi...@googlegroups.com
Hi Andy

I understand that the Papers Past corpus is natively METS/ALTO; is that going to be exposed? At the moment, what we get in the way of full text is quite severely downgraded; punctuation and capitalisation and any kind of markup is gone. It would be great to be able to just get the XML that PP has.

Con

Andy Neale

unread,
Sep 11, 2014, 3:09:04 PM9/11/14
to <digitalnz@googlegroups.com>
Hiya,

I'm sorry to say that no I don't think this will change soon. You have to remember that this data is coming from the the DigitalNZ search index. DigitalNZ and Papers Pat are quite separate systems and we are not yet in a position to launch a full data service from Papers Past. It's something on our roadmap though. Comparing with Trove, remember that Trove IS Papers Past in a sense, where as for is for us our systems are completely different. Sorry about that  

Andy
Reply all
Reply to author
Forward
0 new messages