OCR SDK integration into Dspace platform

186 views
Skip to first unread message

Aisha H

unread,
Sep 8, 2015, 4:54:00 AM9/8/15
to DSpace Technical Support

Dear all,



What is the prerequisites required in the OCR SDK to be integrated  into Dspace platform?


We have installed an Arabic optical character recognition called “Sakhr” that has software development kit (SDK) with COM and DLL standard APIs.

Could we integrate the Automatic Reader APIs into  Dspace platform?


If yes, Should we use the OCR DLL SDK (library Engine) or the OCR COM SDK (COM Engine) ?



Regards,

helix84

unread,
Sep 8, 2015, 5:04:50 AM9/8/15
to Aisha H, DSpace Technical Support
The easiest would be to write a curation task. It's a well-integrated
interface in DSpace that will let you call your OCR plugin from
command line or GUI on any specified set of DSpace objects (items,
collections, communities or the whole site). In theory, you can write
a curation task in any language that runs on JVM. In practice, Java,
Jython, JRuby and Groovy have been tried.

Your curation task will be served items by the curation system, one at
a time. Your code should take the item's PDF bitstream, call the OCR
library, take its output as a text file and store it into the item's
TEXT bundle.

https://wiki.duraspace.org/display/DSDOC5x/Curation+System
https://wiki.duraspace.org/display/DSPACE/Curation+Task+Cookbook

Regarding COM, this may be useful:

ftp://ftp.tuwien.ac.at/.vhost/tutorialbox.com/tutors/J++/ch16.htm


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Aisha H

unread,
Sep 9, 2015, 1:44:44 AM9/9/15
to DSpace Technical Support
Is there another way to integrate the OCR SDK into Dspace platform?

Aisha H

unread,
Sep 9, 2015, 3:06:16 AM9/9/15
to DSpace Technical Support, aish...@gmail.com, hel...@centrum.sk
Could you please provide me with the samples of curation tasks that have been written in Java, 
Jython, JRuby and Groovy?

Thank you so much for your assistance. it was very helpful :)

helix84

unread,
Sep 9, 2015, 3:33:49 AM9/9/15
to Aisha H, DSpace Technical Support
On Wed, Sep 9, 2015 at 9:06 AM, Aisha H <aish...@gmail.com> wrote:
> Could you please provide me with the samples of curation tasks that have
> been written in Java,
> Jython, JRuby and Groovy?

A basic example is part of the documentation I already linked:
https://wiki.duraspace.org/display/DSDOC5x/Curation+tasks+in+Jython
Reply all
Reply to author
Forward
0 new messages