Language pack for: Swedish language, Estonian Language.

481 views
Skip to first unread message

Charles Roos

unread,
Jun 12, 2011, 8:19:56 AM6/12/11
to tesseract-ocr
Do you have
Language pack for: Swedish language, Estonian Language?
Or do you know free ocr software for those languages?
Thx.

patrickq

unread,
Jun 12, 2011, 8:40:22 AM6/12/11
to tesseract-ocr
The Swedish language pack is right there on the downloads page (and
we've been using it successfully). Don't know about Estonian.

Charles Roos

unread,
Jun 12, 2011, 8:45:26 AM6/12/11
to tesseract-ocr
I downloaded files:
tesseract-ocr-files.zip
tesseract-ocr-pages.zip
Which file in those zip's are for Swedish language?
Br.,
C.

Charles Roos

unread,
Jun 12, 2011, 9:07:01 AM6/12/11
to tesseract-ocr

Charles Roos

unread,
Jun 12, 2011, 9:20:57 AM6/12/11
to tesseract-ocr
I downloaded Swedish language pack file ("swe-frak.traineddata") from
there:
http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak.traineddata.gz&can=1&q=language+data
I saved it to folder
"C:\WINDOWS\tessdata\"
I restarted "FreeOCR v3", i choosed from combobox "OCR Language" item
"swe".
I pressed "Scan", document image was scanned into left pane.
Then i clicked "OCR", but nothing happened- the right pane content
stayed with helpful default text.
Then i changed language to "Eng" and pressed "OCR", and right panel
was filled with scanned text, but shedish letters are wrong in this
way.
Why Swe-ocr doesn't work?
Br.,
C.

On Jun 12, 4:07 pm, Charles Roos <mr.charles.r...@gmail.com> wrote:
> Hi,
> i found it,
> thx.
>
> http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak...

Sriranga(78yrsold)

unread,
Jun 12, 2011, 9:24:42 AM6/12/11
to tesser...@googlegroups.com
language pack should be installed under tessdata folder of freeOCR


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Sven Pedersen

unread,
Jun 12, 2011, 9:34:22 AM6/12/11
to tesser...@googlegroups.com
Hi Charles,
That is for fraktur fonts, I believe. It was my understanding that there was another training set for regular Swedish. Check out swe.traineddata.gz at 
But I'm of Norwegian extraction, so haven't looked into it much... :-P
-_Sven


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Charles Roos

unread,
Jun 12, 2011, 9:54:11 AM6/12/11
to tesseract-ocr
Also NORway language pack OCR doesn't produce any character for me.
Also when i create by hand directory
"C:\Program Files\FreeOCR\tessdata"
then nothing changes to better again.
I restarted computer, no success of that again.
I wil ltry to re-install Free-OCR software now.
C.

On Jun 12, 4:34 pm, Sven Pedersen <sven.peder...@gmail.com> wrote:
> Hi Charles,
> That is for fraktur fonts, I believe. It was my understanding that there was
> another training set for regular Swedish. Check out swe.traineddata.gz athttp://code.google.com/p/tesseract-ocr/downloads/listhttp://code.google.com/p/tesseract-ocr/downloads/list
> But I'm of Norwegian extraction, so haven't looked into it much... :-P
> -_Sven
>
> On Sun, Jun 12, 2011 at 8:20 AM, Charles Roos <mr.charles.r...@gmail.com>wrote:
>
>
>
> > I downloaded Swedish language pack file ("swe-frak.traineddata") from
> > there:
>
> >http://code.google.com/p/tesseract-ocr/downloads/detail?name=swe-frak...

Charles Roos

unread,
Jun 12, 2011, 10:08:45 AM6/12/11
to tesseract-ocr
Also re-installing software didn't change anything- i can only do OCR
in English, however i can select in Language combo box "nor" and "swe"
now, which doesn't work.
I downloaded the exe-file from there:
http://www.paperfile.net/freeocr.exe
I have Windows Xp.
Seems for me that only english language works, other languages don't
work.
C.


On Jun 12, 4:54 pm, Charles Roos <mr.charles.r...@gmail.com> wrote:
> Also NORway language pack OCR doesn't produce any character for me.
> Also when i create by hand directory
> "C:\Program Files\FreeOCR\tessdata"
> then nothing changes to better again.
> I restarted computer, no success of that again.
> I wil ltry to re-install Free-OCR software now.
> C.
>
> On Jun 12, 4:34 pm, Sven Pedersen <sven.peder...@gmail.com> wrote:
>
> > Hi Charles,
> > That is for fraktur fonts, I believe. It was my understanding that there was
> > another training set for regular Swedish. Check out swe.traineddata.gz athttp://code.google.com/p/tesseract-ocr/downloads/listhttp://code.goog...

Sriranga(78yrsold)

unread,
Jun 12, 2011, 10:12:05 AM6/12/11
to tesser...@googlegroups.com
No I dont agree with your views. Even kANNADA lang works well in the freeOCR. have you read instructions how to add datafiles under tessdata folder of free0cr?

Charles Roos

unread,
Jun 12, 2011, 10:19:02 AM6/12/11
to tesseract-ocr
I read and did exactly how is described under this link:
http://www.paperfile.net/ocr_lang.htm
If i click 'Settings' menu and then choose 'Open Language Folder'
then this folder is opened for me:
"C:\WINDOWS\tessdata\"
There i see 8 files starting with "eng.", and also i see files
"nor.traineddata", "swe.traineddata" both have ca 2332KB size.
When i start FreeOCR i see 3 languages in drop-down box "OCR
Language:":
eng
nor
swe.
If i select "eng", then OCR succeeds. But with oter 2 language ocr-ing
doesn't succeed. No new data comes to right panel.
Maybe i should try older FreeOCR version, i will try to find older
version.
C.


On Jun 12, 5:12 pm, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> No I dont agree with your views. Even kANNADA lang works well in the
> freeOCR. have you read instructions how to add datafiles under tessdata
> folder of free0cr?
>

Sriranga(78yrsold)

unread,
Jun 12, 2011, 10:21:49 AM6/12/11
to tesser...@googlegroups.com
why not try with vietOCR which supports all langs and all formats of image

Sriranga(78yrsold)

unread,
Jun 12, 2011, 10:35:28 AM6/12/11
to tesser...@googlegroups.com
OK  please forward   traineddata file  and image file. i shall try in my freeocr and feedback to you. I feel it should work.

Charles Roos

unread,
Jun 12, 2011, 10:34:57 AM6/12/11
to tesseract-ocr
I installed vietOCR now, the language combo has only English and
vietnamese language there.
I copypasted FreeOCR's Norway and Swedish language files to folder:
"C:\Program Files\VietOCR.NET\tessdata"
After restarting, the select-box "OCR-Laqnguage" didnt get those new
languages there.
When choosing vietnamese language i get system error/bug when OCR-ing,
with English option everything works.
I think something is wrong with my computer perhaps.
Thanks anyway,
C.

On Jun 12, 5:21 pm, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> why not try with vietOCR which supports all langs and all formats of image
>

Sriranga(78yrsold)

unread,
Jun 12, 2011, 10:37:51 AM6/12/11
to tesser...@googlegroups.com
here also same mistake done in the freeocr. Infact in vietocr tessdata folder is in the tesseract folder wherein it contains tess.exe and tessdata folder.

Charles Roos

unread,
Jun 12, 2011, 11:02:53 AM6/12/11
to tesseract-ocr
I don't have file "tess.exe" at all.
But i have those files:
C:\Program Files\VietOCR.NET\VietOCR.exe
C:\Program Files\VietOCR.NET\tessdata\nor.traineddata
C:\Program Files\FreeOCR\FreeOCR.exe
C:\Program Files\FreeOCR\tessdata\nor.traineddata
C:\WINDOWS\tessdata\nor.traineddata

FreeOCR when running shows option to choose language "Norway", but
VietOCR doesn't show this language.
So, VietOCR doesn't allow to install new language at all, but FreeOCR
allows but the installaed language doesn't produce any output when ocr-
ing.
I will try both programs in my other computer on Monday.
I think i won't post screenshots here, i don't believe anything
solving can be seen on those.
C.


On Jun 12, 5:37 pm, "Sriranga(78yrsold)" <withblessi...@gmail.com>
wrote:
> here also same mistake done in the freeocr. Infact in vietocr tessdata
> folder is in the tesseract folder wherein it contains tess.exe and tessdata
> folder.
>

Quan Nguyen

unread,
Jun 12, 2011, 11:54:46 AM6/12/11
to tesseract-ocr
You've mixed up between Tesseract program vs. data version.
*.traineddata is for 3.0x. VietOCR.NET is currently only compatible
with 2.04. To use *.traineddata, you'll need the Java version @
http://sourceforge.net/projects/vietocr/files/vietocr/3.1.3 .

Charles Roos

unread,
Jun 14, 2011, 7:02:10 AM6/14/11
to tesser...@googlegroups.com
VietOCR with Swe-pack does very bad OCR-ing of my "svenska.png" file.
Can you try if you get as bad results with attached file?
C.

2011/6/12 Quan Nguyen <nguy...@gmail.com>
vietocr_very_bad_swe_ocr.png
svenska.png

Sven Pedersen

unread,
Jun 14, 2011, 7:23:33 AM6/14/11
to tesser...@googlegroups.com
Hi Charles,
Using screen captures for OCR usually doesn't work directly. You'll need to increase the resolution, perhaps with ImageMagick or something. Tesseract needs the height of each letter to be a certain amount -- you'll find info in the documentation, but try upsizing from 95 to 200 dpi. You should get better results. Ideal range is 200 -- 300 dpi. It is much better if you make the original image at that resolution.
--Sven

Quan Nguyen

unread,
Jun 14, 2011, 9:29:50 AM6/14/11
to tesseract-ocr
Charles, please try again with Image > Screenshot Mode turned on.

Albeit, scan the image again with proper resolution, as Sven
suggested. The resolution of screen captures is generally not adequate
for OCR purpose.

On Jun 14, 6:02 am, Charles Roos <mr.charles.r...@gmail.com> wrote:
> VietOCR with Swe-pack does very bad OCR-ing of my "svenska.png" file.
> Can you try if you get as bad results with attached file?
> C.
>
> 2011/6/12 Quan Nguyen <nguyen...@gmail.com>
> ...
>
> read more »
>
>  vietocr_very_bad_swe_ocr.png
> 421KViewDownload
>
>  svenska.png
> 54KViewDownload

Charles Roos

unread,
Jun 14, 2011, 3:11:09 PM6/14/11
to tesser...@googlegroups.com
Hi,
I finally got with java based VietOCR my Swedish ocr-ing working, with not so bad results.
Thx everybody,
C.



2011/6/14 Quan Nguyen <nguy...@gmail.com>
swe_ocr.PNG

Sriranga(78yrsold)

unread,
Jun 14, 2011, 11:45:18 PM6/14/11
to tesser...@googlegroups.com
charles,
congratulations!. You have succeeded in using vietocr. You can utilise "<lang>.DangAmbigs.txt" under folder "Data"  as well as "dict" also to attain more accuracy.
Wish you Good Luck,
-sriranga(78yrs)
Reply all
Reply to author
Forward
0 new messages