The net_spec in the chi_sim.traineddata

161 views
Skip to first unread message

roberty...@gmail.com

unread,
Aug 23, 2017, 2:51:53 AM8/23/17
to tesseract-ocr
Hello,

I have pulled out the network of the chi_sim.traineddata with the command:  combine_tessdata -u ../../tessdata/chi_sim.traineddata ../../chi_sim_comp

Then I observe the network which is shown in the chi_sim_comp file. The network is [1,48,0,1Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1]

By analyzing the VGSL Specs language, I can infer that the output layer of the network is O1c1, which means that Output layer produces 1-d (sequence) output, trained with CTC,outputting 1 class.


Why does the output layer end up in one category? Whether the network structure recorded in the chi_sim.traineddata will be wrong?

ShreeDevi Kumar

unread,
Aug 23, 2017, 3:00:00 AM8/23/17
to tesser...@googlegroups.com
I think that number is ignored and the actual number generated from unichasrset is used.

Usually there will be a message right at beginning of training showing the number being used.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5f5e3422-59e4-499e-bc4d-84ed214c1523%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

roberty...@gmail.com

unread,
Aug 23, 2017, 5:18:14 AM8/23/17
to tesseract-ocr
Year, I have observed the builted network at beginning of the training step. Thanks for reply.

The basetrain.log file shows that  Built network:[1,48,0,1 [C3,3 Ft16] Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 Fc209] from request [1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1]

Some problms for understanding this builted network:

1. [C3,3 Ft16] layers in the network has been enclosed in brackets. But why it is enclosed in brackets? What does it stand for with the brackets?
2.
Fc209 the last layer of this network is a Fully-connected layer. what's the meanings of the 'c' in this layer? I cannot find what 'Fc' represents in the VGSLSpecs tutorial.

Thanks.



在 2017年8月23日星期三 UTC+8下午3:00:00,shree写道:
I think that number is ignored and the actual number generated from unichasrset is used.

Usually there will be a message right at beginning of training showing the number being used.
On 23-Aug-2017 12:21 PM, <roberty...@gmail.com> wrote:
Hello,

I have pulled out the network of the chi_sim.traineddata with the command:  combine_tessdata -u ../../tessdata/chi_sim.traineddata ../../chi_sim_comp

Then I observe the network which is shown in the chi_sim_comp file. The network is [1,48,0,1Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1]

By analyzing the VGSL Specs language, I can infer that the output layer of the network is O1c1, which means that Output layer produces 1-d (sequence) output, trained with CTC,outputting 1 class.


Why does the output layer end up in one category? Whether the network structure recorded in the chi_sim.traineddata will be wrong?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Aug 23, 2017, 7:22:37 AM8/23/17
to tesser...@googlegroups.com

Loaded file ./tess4training-save/tess4training-vedic/tessdata/best/Devanagari.lstm, unpacking...

Warning: LSTMTrainer deserialized an LSTMRecognizer!

Code range changed from 217 to 157!!

Num (Extended) outputs,weights in Series:

1,48,0,1:1, 0

Num (Extended) outputs,weights in Series:

C3,3:9, 0

Ft16:16, 160

Total weights = 160

[C3,3Ft16]:16, 160

Mp3,3:16, 0

Lfys64:64, 20736

Lfx64:64, 33024

Lrx64:64, 33024

Lfx512:512, 1181696

Fc157:157, 80541

Total weights = 1349181

Previous null char=2 mapped to 2

Continuing from ./tess4training-save/tess4training-vedic/tessdata/best/Devanagari.lstm



​See the line:

Code range changed from 217 to 157!!

That is the size of the unicharset. That is what is used in Fc.



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages