Hi!
I'm trying to use Kaldi for decoding a handwriting recognition model. My situation is quite similar to what was described in the following
discussion, the difference concerns the database that I'm using, which has 80 classes (78 chars + space + CTC blank label). I'm studying the tool but I still have some doubts. I intend to use the decode-faster-mapped, which involves a matrix of log-likelihoods, a decoding graph, and the HMM transition model. For now, I would like some help regarding the log-likelihoods matrix input format. So, for example, considering 80 classes and a test set with only one image, my model generates as output a matrix of log-likelihoods in the following way:
Shape: [18,80]
Truncated representation:
[[-13.487 -8.355 -8.416 ..., -13.693 -0.010 -4.797]
[-9.132 -7.163 -6.488 ..., -13.671 -0.014 -5.855]
[-9.017 -6.391 -4.293 ..., -9.185 -7.233 -5.949]
...,
[-17.873 -16.232 -13.409 ..., -16.146 -10.533 -0.029]
[-19.022 -16.719 -16.815 ..., -17.294 -16.461 -10.670]
[-22.810 -17.978 -19.435 ..., -20.250 -12.571 -0.004]]
Given this, which of the following options is the appropriate format for the matrix of log-likelihoods used for decode-faster-mapped?
I. Newline-terminated string to define multidimensionality
image_01 [-13.487 -8.355 -8.416 ..., -13.693 -0.010 -4.797
-9.132 -7.163 -6.488 ..., -13.671 -0.014 -5.855
-9.017 -6.391 -4.293 ..., -9.185 -7.233 -5.949
...,
-17.873 -16.232 -13.409 ..., -16.146 -10.533 -0.029
-19.022 -16.719 -16.815 ..., -17.294 -16.461 -10.670
-22.810 -17.978 -19.435 ..., -20.250 -12.571 -0.004]
II. Putting all log-likelihoods in one line separating by time steps
image_01 [ -13.487 -8.355 -8.416 ... -13.693 -0.010 -4.797] [-9.132 -7.163 -6.488 ... -13.671 -0.014 -5.855] [-9.017 -6.391 -4.293 ... -9.185 -7.233 -5.949] ...
III. Other
Thanks in advance,
Dayvid Castro