I'm considering to apply to GSoC this year, and if I do, I would like to
improve the status of scanning and optical character recognition in KDE; being
more specific:
What I want to achieve
--------------------------------
A few years ago I had to study electronics stuff at my university following
class notes only available in paper. I was annoyed because I couldn't use
ctrl+F with a paper, so I investigated a bit about OCR stuff and I found the
open djvu file format[1].
So I tried to produce a djvu document with KDE and my free operating system;
it was (very difficult|impossible); if recall correctly I did something like
this: I tried to figure out a workflow scanning a couple of pages as 2 jpeg
files, then I tried to join them in a djvu multipage document using shell
commands, and suceeded. However I couldn't find out how to do the OCR part,
iirc I tried a couple of free ocr programs (I didn't tried ocropus; I don't
remember if either that program didn't exist at that moment or I just didn't
know about it) but their output was just the text without the coordinates
where the texts are located, which would be needed to produce a proper text
layer in the djvu document.
So I gave up and rebooted on Windows and I used a propietary software to
produce the document; it worked quite well, I just fed the papers in my
scanning device and produced a multipage document; when done I just clicked a
menu item labelled as something "process the document using OCR" and that's
it. I don't remember very well the name of the software I used, but I'd swear
it was "Document Express"[2].
The result was excellent, and you can download the produced document here:
http://alioth.debian.org/~santa-guest/gsoc2012/apuntes_te.djvu
As you can see, the size of the document is reasonable (only 2.4M) and you can
do ctrlf+F "zener" and read stuff about zener diodes.
So... to sum up: it was/is easier to produce good djvu documents with
propietary software. I want a KDE'ish program to replace the expensive
"Document Express".
Some technical details
--------------------------------
Currently we have a couple of KDE programs to scan documents: skanlite and
kooka. skanlite is quite simple (doesn't do OCR stuff), uses the modern liksane
library, it's in extragear and works fine. kooka provided more functionality in
the KDE 3 old days than skanlite today (seems it was able to do some basic OCR
stuff), uses its obsolete libkscan library, it's in playground and I don't know
if it works or not because I don't have an scanning device right now, but at
least it builds properly.
So... looks like the tasks to do to achive my goal would be:
1. If needed, extend libksane functionality in order to make it a good
replacement for the old libkscan.
2. Port kooka to the modern libksane.
3. Add ocropus support to kooka (I heard with ocropus you can get the
coordinates of the texts, but I don't know for sure yet)
4. Code something in kooka to produce djvu documents.
[1]http://en.wikipedia.org/wiki/DjVu
[2]https://www.caminova.net/en/shop/item.aspx?itemid=3
> I'm considering to apply to GSoC this year, and if I do, I would like to
> improve the status of scanning and optical character recognition in KDE; being
> more specific:
>
>
> What I want to achieve
> --------------------------------
> ...
> So... to sum up: it was/is easier to produce good djvu documents with
> propietary software. I want a KDE'ish program to replace the expensive
> "Document Express".
Thats a very ambitious target.
>
>
> Some technical details
> --------------------------------
>
> Currently we have a couple of KDE programs to scan documents: skanlite and
> kooka. skanlite is quite simple (doesn't do OCR stuff), uses the modern liksane
> library, it's in extragear and works fine. kooka provided more functionality in
> the KDE 3 old days than skanlite today (seems it was able to do some basic OCR
> stuff), uses its obsolete libkscan library, it's in playground and I don't know
> if it works or not because I don't have an scanning device right now, but at
> least it builds properly.
There has been a KDE4 port of Kooka, as it was KDE3 originally. That
worked quite ok.
> So... looks like the tasks to do to achive my goal would be:
> 1. If needed, extend libksane functionality in order to make it a good
> replacement for the old libkscan.
I think thats already finished :-)
> 2. Port kooka to the modern libksane.
Cool, but I think Kooka as an app needs much more than just a new
underlying lib. Graphics apps nowadays are much more cool than Kooka
ever was. So if you pick that I think you should be willing to bring
Kooka to an up to date state. However I am not so sure if there is still
a demand for that kind of app...
> 3. Add ocropus support to kooka (I heard with ocropus you can get the
> coordinates of the texts, but I don't know for sure yet)
> 4. Code something in kooka to produce djvu documents.
The idea back in the days was to provide a component for OCR which can
be reused in all apps which deal with images, similar to what the
ScanService is (you can find it for example in Gwenview under the Moduls
menu. I think that would be really cool and could be a great GSOC
project imo.
regards,
Klaas
>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<
On Tuesday 06 March 2012 14:16:34 Klaas Freitag wrote:
> On 06.03.2012 14:00, José Manuel Santamaría Lema wrote:
> Hey José,
>
> > I'm considering to apply to GSoC this year, and if I do, I would like to
> > improve the status of scanning and optical character recognition in KDE;
> > being more specific:
> >
> >
> > What I want to achieve
> > --------------------------------
> > ...
> > So... to sum up: it was/is easier to produce good djvu documents with
> > propietary software. I want a KDE'ish program to replace the expensive
> > "Document Express".
>
> Thats a very ambitious target.
>
>
> > So... looks like the tasks to do to achive my goal would be:
> > 1. If needed, extend libksane functionality in order to make it a good
> > replacement for the old libkscan.
>
> I think thats already finished :-)
>
> > 2. Port kooka to the modern libksane.
>
> Cool, but I think Kooka as an app needs much more than just a new
> underlying lib. Graphics apps nowadays are much more cool than Kooka
> ever was. So if you pick that I think you should be willing to bring
> Kooka to an up to date state. However I am not so sure if there is still
> a demand for that kind of app...
>
> > 3. Add ocropus support to kooka (I heard with ocropus you can get the
> > coordinates of the texts, but I don't know for sure yet)
> > 4. Code something in kooka to produce djvu documents.
>
> The idea back in the days was to provide a component for OCR which can
> be reused in all apps which deal with images, similar to what the
> ScanService is (you can find it for example in Gwenview under the Moduls
> menu. I think that would be really cool and could be a great GSOC
> project imo.
>
Yes, it would be really cool :)
I think I would prioritize like this:
1) Create a non-GUI Qt/KDE library that can take an (Q)image and generate
output suitable for djvu/PDF/ODF. Maybe even generate djvu/PDF/ODF files.
2) Make a simple GUI around the library to test the functionality.
3) Add the ORC part to the KScan plugin ksaneplugin. (kdegraphics)
4) Create a Kipi-plugin for use in Gwenview,Digikam,....
5) Standalone document scanning application that is specialized for multipage
scanning to PDF/djvu/ODT.
I'm not familiar with the ocropus API, so I'm not sure how much work it would
be. I'm not sure one GSOC would be enough for all 5 points ;)
Regards,
Kåre
I sent this suggestion to the kde-hardware mailing list, but it seems
relevant here:
Scanner kio slave. An easy scanner interface using file managers
(like the current CD ripper kio slave). There would be a folder for
each scanner. When the folder is opened it will pull in a preview
from that scanner. There would then be folders for supported
resolutions, with individual files for common paper sizes, the whole
scanner area, auto-detected pictures (i.e. if you can multiple
pictures at the same time) and, if available, text files for OCR.
Dragging one of these to the filesystem will trigger a full scan with
those settings.
-Todd
I'm not sure a KIOslave makes that much sense for managing a scanner.
/Sune
Morning José and Klaas
> > I'm considering to apply to GSoC this year, and if I do, I would like to
> > improve the status of scanning and optical character recognition in KDE;
> > being more specific:
> >
> >
> > What I want to achieve
> > --------------------------------
> > ...
> > So... to sum up: it was/is easier to produce good djvu documents with
> > propietary software. I want a KDE'ish program to replace the expensive
> > "Document Express".
>
> Thats a very ambitious target.
Absolutely.
[snip[
> > 2. Port kooka to the modern libksane.
>
> Cool, but I think Kooka as an app needs much more than just a new
> underlying lib. Graphics apps nowadays are much more cool than Kooka
> ever was. So if you pick that I think you should be willing to bring
> Kooka to an up to date state. However I am not so sure if there is still
> a demand for that kind of app...
There is absolutely a demand for a KDE OCR app. As KDE wants to focus this
year on Accessibility we should be aware that OCR is a big topic for blind
people!
Thx
Mario
>
> Scanner kio slave. An easy scanner interface using file managers
> (like the current CD ripper kio slave). There would be a folder for
> each scanner. When the folder is opened it will pull in a preview
> from that scanner. There would then be folders for supported
> resolutions, with individual files for common paper sizes, the whole
> scanner area, auto-detected pictures (i.e. if you can multiple
> pictures at the same time) and, if available, text files for OCR.
> Dragging one of these to the filesystem will trigger a full scan with
> those settings.
Thats a nice thought from a technical POV but imo thats not what a user
wants.
Think of typical usecases:
- I want to scan a pile of photos me and my blue Opel Kadett in 1986.
For that, the scan app should detect the borders, maybe deskew and
turn a bit and save automatically.
- I want to scan a letter or newspaper snippet and ocr it.
- I want to do a photocopy - that could be a nice plasma app imo, just a
typical copy machine button to press, maybe a switch between color and
grayscale. The app would scan and print automatically.
- I want to scan a bunch of form documents and know where a barcode is
located on them, which should be read automatically.
These kind of things. Not sure if a kio is cool for any of these.
Klaas
The kio slave would detect the photos and list them, with previews, as
individual image files (one file for each photo). You could then just
drag the mage files to where you want to save them.
> - I want to scan a letter or newspaper snippet and ocr it.
There would be text files of various formats (without previews, of
course) listed in the kio slave. You would just grab the text file of
the format you want, drag and drop it where you want to save it (or
open it in a program), and the image will be scanned, OCRed, and then
saved in the chosen format (or opened in the chosen program).
> - I want to do a photocopy - that could be a nice plasma app imo, just a
> typical copy machine button to press, maybe a switch between color and
> grayscale. The app would scan and print automatically.
There could probably be a right-click menu item to print the selected
file. Actually, being able to print any file in dolphin or konqueror
from right-click would be nice, but probably outside the scope of
this.
> - I want to scan a bunch of form documents and know where a barcode is
> located on them, which should be read automatically.
Barcode searches seem to be a fairly niche thing for scanned
documents, so that might be better as a standalone program that acts
on image files of any sort (I think barcode detection would be more
useful in a webcam app, actually). If you reall want it the kio
slave could have a text file containing all barcode numbers.
QR codes (square barcodes) are a different story, and I think would be
great in the kio slave. The kio slave could detect the QR code,
figure out what content it displays, then display a file for that
content. For instance a QR code for a URL would show up in the KIO
slave as an HTML file that when clicked would open the URL in the
default browser. A QR code for an address or coordinates could have a
file that would show it in marble. Contacts or emails could also be
displayed as corresponding files as well.
> These kind of things. Not sure if a kio is cool for any of these.
>
> Klaas
A gui able to do all the things you listed would necessarily be
extremely complicated and likely difficult to use, unless most of the
tasks were automated push-button affairs. In the latter case, there
is little advantage over a kio slave. I would think that a kio slave
would be more natural, since users would not need to know terminology
or the menu structure.
For instance, you could offer an OCR button, but what about people who
don't know what OCR stands for (my parents, for example)? On the
other hand, with a kio slave, the person is looking to get text, the
want a text file, so it should be pretty obvious that is what they
want. Similarly, with scanning multiple photos at the same time, in a
stand-alone program the user would need to know where to go to get
that. On the other hand, in a kio slave they will see files with
previews for their individual photos, so it should be clear that is
what they are looking for.
Note that the kio slave doesn't preclude a more advanced interface,
they could both be GUIs for a shared set of underlying libraries.
-Todd
Maybe I didn't use enough of the more fancy kio-slaves, but I have a
hard time imagining how I'd be able to use this with say konqueror. I'd
go to
kscan://<scannername>/
And then see whats been scanned, but how do I initiate a scan? Do I need
to go to some special url? If so, how do I trigger the OCR creation
after scanning?
Note I'm assuming my scanner does not have any buttons for this stuff on
its own, which seems to still be a reasonable assumption even for
consumer-devices (as a quick browse of my favourite hardware dealer
shows).
Andreas
To activate a scan of an image, you either drag the image file in the
kio slave to another folder, or you open it in a program (either by
clicking or using the right-click menu). In the case of dragging it
to a folder, it will be automatically scanned and saved in the
destination folder without the user needing to do anything else. In
the case where you open it in a program, it will probably be scanned
to a temporary folder or stored in memory and then opened in the
program, once again without the user doing anything else.
In the case of OCR, it would be the same, except a temporary image
file woulds be scanned, OCRed, and deleted (or again stored in
memory).
This, at least, is how the CD kio slave does it.
-Odd
I see, so I misunderstood a bit what the content of the slave would be.
Interesting idea. The cd-kioslave is a bit different since it has actual
"files" on the medium backing up the files shown (the audio-tracks).
Andreas
If somebody is interested in making such a kio slave, for simple usecases, I
would say go ahead and scratch your itch :) I do have a some doubts about the
usability tho.
1) You would have to "refresh" the view to get a new preview of new photos
placed on the scanner and the automatic photo finder is bound to fail
sometimes and you would be unable to select the correct part of the images.
2) You have options (folders?)
- scan mode: grayscale, color
- resolution 50 100 150 300 600 1200 2400 4800 ...
- source: flatbed, automatic document feeder, transparency unit, ...
- how would you adjust gamma if available
- contrast/light...
- ...
3) Multipage scanning from ADF can not have a preview...
For simple point and shoot it might work some of the time but I'm not sure the
amount of bug reports for heuristics failures would be fun to go through ;)
I think a Qt ORC library would be more than welcome also for the kio slave and
that could be the main target for the GSoC.
Kåre
Yes, refreshing would be needed, both for this and for a standalone app.
The issue with incorrectly detected borders would also affect a
standalone app. Of course this is intended for simple jobs, anything
complicated would need a more advanced app. But for most cases simple
is enough.
> 2) You have options (folders?)
> - scan mode: grayscale, color
> - resolution 50 100 150 300 600 1200 2400 4800 ...
> - source: flatbed, automatic document feeder, transparency unit, ...
> - how would you adjust gamma if available
> - contrast/light...
The only folders would probably be resolution, and one extra folder
for the ADF if available. The ADF would primarily be useful for PDFs,
TIFFs, and OCR, so in that folder could be individual files for OCR,
PDFs at various resolutions, and TIFFs at various resolutions.
Color vs. grayscale could have two images for the whole scan, so only
one more file per resolution. OCR would handle that automatically,
and scanned photos are unlikely to be in grayscale.
Transparency units usually replace the main scan bed, so the could be
detected as individual pictures and scanned that way.
Gamma, contrast, lightness, etc would require a standalone app.
> 3) Multipage scanning from ADF can not have a preview...
No, but this is true in a standalone app as well.
> For simple point and shoot it might work some of the time but I'm not sure the
> amount of bug reports for heuristics failures would be fun to go through ;)
The same bug reports would be needed for a standalone app, since it
would be using the same defaults.
-Todd
/Kåre
Put it on the wikipage, wait for KDE to be a gsoc approved organization,
wait for official project submissions, get the student to submit the
project. hope that KDE as a organization selects your project.
> Who is willing to mentor for an ORC library + plugins + ...?
My best guess for a great mentor for such a project would be someone
with a iki.fi address with a firstname and a lastname of 4 letters, both
containing some 'nordic' letters. :)
> What does a mentor do and how much time does it take?
a mentor answers questions from the student. A mentor ensures the
student is on right track. A mentor fails or passes the project.
A mentor does a lot of code review
A mentor also ensures that the project plan is good and sane.
Mentoring, depending on student and the task, is probably 1-2 hours a
day or something like that.
Fram my experience with santa, he is probably going to be one of the
students who is going to be pretty light to mentor.
/Sune
> > What does a mentor do and how much time does it take?
>
> a mentor answers questions from the student. A mentor ensures the
> student is on right track. A mentor fails or passes the project.
> A mentor does a lot of code review
> A mentor also ensures that the project plan is good and sane.
>
> Mentoring, depending on student and the task, is probably 1-2 hours a
> day or something like that.
>
> Fram my experience with santa, he is probably going to be one of the
> students who is going to be pretty light to mentor.
>
That kind of time is unfortunately not something I have :(
I am willing to answer questions and help, but I don't think I can take the
mentor responsibility.
Is there somebody out there with OCR experience (which I lack) that is willing
to take the mentoring? It would be a shame to let this opportunity pass.
Sorry,
Kåre