Re: GSoC idea: improving scanning and OCR in KDE (skanlite/kooka)

7 views
Skip to first unread message

José Manuel Santamaría Lema

unread,
Mar 14, 2012, 5:00:48 AM3/14/12
to kde-...@kde.org
Thank you Kåre and Klaas for your replies, I had some time to dig a bit more
about this:

Kåre Särs <kare...@iki.fi>
> [snip]
>
> 1) Create a non-GUI Qt/KDE library that can take an (Q)image and generate
> output suitable for djvu/PDF/ODF. Maybe even generate djvu/PDF/ODF files.
>
> 2) Make a simple GUI around the library to test the functionality.
>
> 3) Add the ORC part to the KScan plugin ksaneplugin. (kdegraphics)
>
> 4) Create a Kipi-plugin for use in Gwenview,Digikam,....
>
> 5) Standalone document scanning application that is specialized for
> multipage scanning to PDF/djvu/ODT.
>
>
> I'm not familiar with the ocropus API, so I'm not sure how much work it
> would be. I'm not sure one GSOC would be enough for all 5 points ;)
>
> Regards,
> Kåre

In first place, I have just realized that gocr is able to provide an output
saying where the characters/words are located (see the gocr man page, I
checked how "-f XML" works with a sample image, and looks like it's what I
need); thus it wouldn't be mandatory to add ocropus support right now; it
would be fine, but optional.

In second place, and just FYI, I've got a ~12 years old scanner, I've tested
both skanlite and kooka, skanlite worked fine, however kooka doesn't work _for_
_me_. Fortunately I think I still can provide a djvu generator supporting OCR
with kooka, even if I don't port it to libksane; see below.

About Kåre's tasks set: I think I would split the first item thus:
1a) Create a non-GUI Qt/KDE library able to open and generate djvu documents
without text layer. (libkdjvu)
1b) Create a non-GUI Qt/KDE library that can take an (Q)image and generate
output suitable for djvu/PDF/ODF (libkocr)
1c) Add suport to the libkdjvu library to include the data retrieved with
libkocr as text layer.

Note that a djvu file may or may not have a text layer. Also note that getting
a text with OCR and creating djvu files joining various images/texts are very
different jobs. That are the reasons to split the first item like that. That
being said, let me do some other remarks and questions:

About my 1a): Perhaps I could reuse some code from okular; I'd need to
investigate more about this.

About my 1b): There is already some code in kooka to do something like that;
see these classes: OcrGocrEngine, OcrEngine and KookaImage. So, performing
these task would be mainly: hacking on OcrGocrEngine in order to make it give
an output suitable for my new libkdjvu library (that would be done processing
the output of "gocr -f XML") and taking all the kooka classes related to ocr
and putting them together in a shared library (libkocr).
Looks like most of kooka files are licensed with GPLv2 only with a couple of
special exceptions; Klaas, could we please change that license to GPLv2 or
later with the same couple of special exceptions? See:
http://techbase.kde.org/Projects/KDE_Relicensing

About 2) and 5): I'm open to other ideas, but right now I tend to think that
both the "simple GUI" mentioned in 2) and the "Standalone document scanning
application" mentioned in 5) will be a new tab in kooka which would behave as
a djvu editor. I did quick mockup, this GUI would be able to open
existing djvu documents as well as creating new ones:
http://alioth.debian.org/~santa-guest/gsoc2012/mockup.png

About 3) and 4): if I create that libkocr library this should be easy to do;
however, I want to understand better how these plugins would work from a user
point of view; for instance, let's say I open a png file in my gwenview, I have
a menu item called "Process image with OCR" inside the "Plugins" menu. What
would happen if I click that item? Would it open a text editor with the OCR
result or what?

signature.asc

Kåre Särs

unread,
Mar 14, 2012, 5:17:29 PM3/14/12
to kde-...@kde.org
On Wednesday 14 March 2012 10:00:48 José Manuel Santamaría Lema wrote:
> Thank you Kåre and Klaas for your replies, I had some time to dig a bit more
> about this:
>
>
> In first place, I have just realized that gocr is able to provide an output
> saying where the characters/words are located (see the gocr man page, I
> checked how "-f XML" works with a sample image, and looks like it's what I
> need); thus it wouldn't be mandatory to add ocropus support right now; it
> would be fine, but optional.
Here the accuracy of the output might be a factor... Which one produces a more
accurate recognition?

How easy it is to implement is also an important factor :)

>
> In second place, and just FYI, I've got a ~12 years old scanner, I've tested
> both skanlite and kooka, skanlite worked fine, however kooka doesn't work
> _for_ _me_. Fortunately I think I still can provide a djvu generator
> supporting OCR with kooka, even if I don't port it to libksane; see below.
>
> About Kåre's tasks set: I think I would split the first item thus:
> 1a) Create a non-GUI Qt/KDE library able to open and generate djvu documents
> without text layer. (libkdjvu)
> 1b) Create a non-GUI Qt/KDE library that can take an (Q)image and generate
> output suitable for djvu/PDF/ODF (libkocr)
> 1c) Add suport to the libkdjvu library to include the data retrieved with
> libkocr as text layer.
>

Sounds good.

> Note that a djvu file may or may not have a text layer. Also note that
> getting a text with OCR and creating djvu files joining various
> images/texts are very different jobs. That are the reasons to split the
> first item like that. That being said, let me do some other remarks and
> questions:
>
> About my 1a): Perhaps I could reuse some code from okular; I'd need to
> investigate more about this.

Maybe even a mentor could be found... (hoping)

>
> About my 1b): There is already some code in kooka to do something like that;
> see these classes: OcrGocrEngine, OcrEngine and KookaImage. So, performing
> these task would be mainly: hacking on OcrGocrEngine in order to make it
> give an output suitable for my new libkdjvu library (that would be done
> processing the output of "gocr -f XML") and taking all the kooka classes
> related to ocr and putting them together in a shared library (libkocr).
> Looks like most of kooka files are licensed with GPLv2 only with a couple of
> special exceptions; Klaas, could we please change that license to GPLv2 or
> later with the same couple of special exceptions? See:
> http://techbase.kde.org/Projects/KDE_Relicensing

Is the preferred license for KDE libraries LGPL?...

>
> About 2) and 5): I'm open to other ideas, but right now I tend to think that
> both the "simple GUI" mentioned in 2) and the "Standalone document scanning
> application" mentioned in 5) will be a new tab in kooka which would behave
> as a djvu editor. I did quick mockup, this GUI would be able to open
> existing djvu documents as well as creating new ones:
> http://alioth.debian.org/~santa-guest/gsoc2012/mockup.png

Sounds good. I think I should have skipped the GUI part for 2) just something
quick to use while developing/testing...

>
> About 3) and 4): if I create that libkocr library this should be easy to do;
> however, I want to understand better how these plugins would work from a
> user point of view; for instance, let's say I open a png file in my
> gwenview, I have a menu item called "Process image with OCR" inside the
> "Plugins" menu. What would happen if I click that item? Would it open a
> text editor with the OCR result or what?

I'm not aware of any other application than Kooka that would have used the OCR
part of the KScan plugin-interface... That does not mean tho that there would
not be a need for it ;)

The Acquireimage kipi-plugin scans and saves images almost like a standalone
application. I think the idea here would be to "export" images to some text
file-format.

Regards,
Kåre


>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

Reply all
Reply to author
Forward
0 new messages