Hello

97 views
Skip to first unread message

Wulin Teo

unread,
Jun 18, 2014, 8:24:39 PM6/18/14
to wellcom...@googlegroups.com
Hi

First of all, I am so excited to learn about wellcome player. I would like to use wellcome player for the presentation of  my thesis. 
I am even excited about the search within the function and seadragon/deep zoom feature. 

I have installed wellcome player in my computer. I have grunted the player and it works in local computer. Like others, I would like to add the function of search within in the welcome player. I have read the thread, but I am not sure I understood it. 
These are my questions:
1. I have to export OCR document in an ALTO xml file, how do I convert it from pdf to alto file?)
2. About generating the packages, do you have any recommendations on how to facilitate the process of creating those packages? (Any recommended JSON editor for doing such task?)


Thanks

 

Tom Crane

unread,
Jun 19, 2014, 1:30:03 PM6/19/14
to wellcom...@googlegroups.com
Hi Wulin Teo,
 
We're glad you like it.
 
As you can see from http://player.digirati.co.uk/digitising.html, digitising a book by hand and producing the tile and metadata assets (package) is a laborious process, but it is possible to produce a completely static site featuring a book viewer. However, a book digitised this way isn't searchable.
 
In the Wellcome Digital Library, everything the player consumes is produced dynamically, by a server-side application. The Library's digitisation workflow produces the following assets:
 
These three types of object, along with information from the library catalogue, are the raw materials that are used by the Wellcome Library's server-side application (the "Digital Delivery System" or DDS) to produce the tiles and package data, and also to provide a search service.
 
Take this example:
 
 
 
If you produce an entirely static version, e.g., by following the process described at http://player.digirati.co.uk/digitising.html, you won't be able to offer search. Even if you could produce ALTO files you still need a server-side process to query them and generate search results.
 
The Player on Github is an entirely client-side application, we haven't yet released any of the server-side components that a library might use. At present, getting this up an running would be a big overhead for casual use.
We would like to do some more work on a more user friendly server implementation that doesn't rely on the infrastructure resources of a large library.
 
So to come back to your questions:
 
1) You need OCR software that can output ALTO format XML (e.g., http://content-conversion.com/ or http://www.abbyy.com/, both commercial)
2) Our packages are generated dynamically from the source METS files; they are never edited by hand.
 
We hope to be able to do some work on a better package editor, hopefully a visual one. ... However, without server-side support you still won't be able to offer search.
 
But we hope you can still use the Player without the search within feature for now.
 
Tom
 
 
PS the following links go into more detail about the Wellcome Library's systems:
 

Wulin Teo

unread,
Jun 20, 2014, 5:08:36 AM6/20/14
to wellcom...@googlegroups.com
Hi Tom, 

Thank you for explaining how the data are being processed from the beginning to the end at the wellcome library . It is very informative. 
Thank you for making digital documents so much live again. 

Wulin

Klaus E. Werner

unread,
Nov 15, 2019, 3:16:14 AM11/15/19
to Universal Viewer
Hello Tom,

I'm putting up our resources using our own viewer and - in parallel - the UV via IIIF manifests and it's working fine for now (http://dlib.biblhertz.it).

I thought to implement OCR SEARCH, too, and after some tinkering found out that providing:
1. the manifest file
2. the annoservices json (more or less a JSON group for each line with text string and canvas coordinates)
is, unfortunately, not enough.

This experience is more or less in line with what's been said by you here ... I found out the hard way.

But is this still vaild? No OCR SEARCH without serve-side SOLR/Elastic setup?

Thanks in advance!
Reply all
Reply to author
Forward
0 new messages