Any multi-tiff tr file shapeclustering method?

22 views
Skip to first unread message

summy00

unread,
Nov 14, 2014, 3:00:43 AM11/14/14
to tesser...@googlegroups.com
Hi, I am a newbee, don't know/understand complex method, 
I just try boxing multi-tiff(you can find all boxing infor in one boxfile by page numbers after the corrdinate), correct error character, 
create tr file as one xxx.tr(6.86 MB),
Compute the Character Set,
but when I want to shapeclustering, it create a shapetable that only has a little size(180K).(may be only create the 1st-tiff page's shapetable file)

I know maybe it can using multi-tr files for shapeclustering, any convient approch for multi-pages shapeclustering ?

My cmd like:
tesseract.exe "new.times.exp[0].tif" "new.times.exp[0]" batch.nochop makebox
tesseract.exe "new.times.exp[0].tif" "new.times.exp[0]" box.train
unicharset_extractor.exe "new.times.exp[0].box"
shapeclustering -F font_properties -U unicharset "new.times.exp[0].tr"
-- this step informed"Master shape_table:Number of shapes = 96 max unichars = 3 number with multiple unichars = 8"

Thanks!



newcorrect.arial.exp[0].box

summy00

unread,
Nov 14, 2014, 3:02:24 AM11/14/14
to tesser...@googlegroups.com
add multi-tiff file
newcorrect.arial.exp[0].tif

summy00

unread,
Nov 14, 2014, 3:26:37 AM11/14/14
to tesser...@googlegroups.com
Hi all, I just forget Compute the Character Set, after I add it, it works well.

But I find it don't recognize well after trainning, any suggestion? 
 

summy00

unread,
Nov 14, 2014, 4:03:06 AM11/14/14
to tesser...@googlegroups.com
I mean the training don't correct the wrong recognization, you can find the last page of tiff is orignal full English text.
Reply all
Reply to author
Forward
0 new messages