Trying to get features for a symbol

69 views
Skip to first unread message

Paul

unread,
Jul 5, 2014, 8:37:20 AM7/5/14
to tesser...@googlegroups.com
Hello everyone,

I am trying to get a symbol's features by using the C API so I can create additional ways for debugging Tesseract.

Can anybody provide me with an example on how to use the following method?

TESS_API void  TESS_CALL TessBaseAPIGetFeaturesForBlob(TessBaseAPI* handle, TBLOB* blob, INT_FEATURE_STRUCT* int_features,
                                                       
int* num_features, int* FeatureOutlineIndex);

As far as I understand it, I need to provide an INT_FEATURE_STRUCT that will hold the resulting features of the symbol after calling the method, as well as an int*, which will hold the number of features. What goes in FeatureOutlineIndex and which TBLOB* do I have to provide? How to I get it?

Or is there another method that can give me the features of a symbol?

Best regards,
Paul

Paul

unread,
Jul 5, 2014, 10:42:30 AM7/5/14
to tesser...@googlegroups.com
I found out the following:
  • int_features should be an array of length 512,
  • num_features tells us how many features are really in int_features
  • FeatureOutlineIndex holds the index of the corresponding outline for each feature in int_features
TESS_API TBLOB*
               TESS_CALL
TessMakeTBLOB(Pix* pix);

will give us the TBLOB* needed for TessBaseAPIGetFeaturesForBlob(). If you want to get the Pix* for every symbol, simply call TessPageIteratorGetImage() or TessPageIteratorGetBinaryImage() while iterating over the symbols.

Now I still need to know how to get the features for prototypes...

Paul

unread,
Jul 7, 2014, 9:00:24 AM7/7/14
to tesser...@googlegroups.com
Is there a method in the public API that allows me to gather the information from normproto or inttemp?

zdenko podobny

unread,
Jul 7, 2014, 12:38:25 PM7/7/14
to tesser...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6b0f5ab3-4c56-4236-aa61-8ae99165cb8c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Paul

unread,
Jul 7, 2014, 2:15:31 PM7/7/14
to tesser...@googlegroups.com
Thank you for the link, Zdenko. Since I also want to get the features from ready-to-use traineddata files, that don't come with box/tr files, I started using the methods in tessdatamanager.h and intprotos.h.

Do you know how the features in tr files differ from the data in the intproto section of a traineddata file? I know that intproto mainly contains the 4D (x, y, angle, length) information.

There are four sections for each box in a tr file: mf, cn, if and tb. if seem to be the 3D int features (x, y, angle), but what do the features at mf, cn and tb mean?

Paul

zdenko podobny

unread,
Jul 7, 2014, 4:53:55 PM7/7/14
to tesser...@googlegroups.com
Paul,

I never play with that, so I suggest you to update/continue at that thread...

Zdenko


Reply all
Reply to author
Forward
0 new messages