Swedish language

43 views
Skip to first unread message

ShreeDevi Kumar

unread,
Jan 6, 2017, 11:06:29 AM1/6/17
to tesser...@googlegroups.com

Peter,

Please see https://github.com/tesseract-ocr/langdata/blob/master/swe/swe.training_text

You can provide additional training text if some needed characters are missing in the above. I can do a test training with it.

- excuse the brevity, sent from mobile


On 06-Jan-2017 5:01 PM, "Peter" <pe...@peterkrantz.se> wrote:


Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
Ray is planning to retrain the languages for the new 4.0.0 version sometime in January. So it would be helpful if you could open an issue on https://github.com/tesseract-ocr/langdata/issues with this information.

Is it possible to contribute training data for this effort? I realise swedish will not be on top of the list but I think it would be easy to involve some of the research community here in contributing training data if it could improve the language model.

/Peter 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

unread,
Jan 8, 2017, 4:09:58 AM1/8/17
to tesser...@googlegroups.com
Testing with tifs created from the training text, accuracy seems quite good for Swedish using 4.0.0-alpha traineddata. Please see attached eval reports.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
swe_arial-report.html
swe_report.html
Reply all
Reply to author
Forward
0 new messages