Doxygen Documentation online

305 views
Skip to first unread message

zdenko podobny

unread,
Mar 18, 2012, 6:25:04 PM3/18/12
to tesser...@googlegroups.com
Hi all,

I put doxygen documentation for current svn code (3.02) online[1]. This could be helpful for those that do not have doxygen installed (windows developers?) or if you need it online. AFAIK online is only very old version[2]. After releasing of 3.02 I will put "offline" version to download section of tesseract[3] (as it was in the past).

I have a thought about publishing documentation including "call" and "caller" graphs (example for baseapi.cpp and baseapi.h in attachment), but final (png) version was 881M uncompressed/671M compressed and it is quite huge for online publishing IMO...

Anyway if you want to try it (on linux), you need to have installed doxygen[4] and Graphviz[5]. Than change few setting in doc/Doxygen to:
HAVE_DOT               = YES
CALL_GRAPH             = YES
CALLER_GRAPH           = YES

Optionally you can try interactive svg images (instead of default png - svg version is smaller 300M uncompressed/46M compressed):
DOT_IMAGE_FORMAT       = svg
INTERACTIVE_SVG        = YES

and than run
$ make doc

from topsource directory (after configuring tesseract ;-) ).

Zdenko

[3] http://code.google.com/p/tesseract-ocr/downloads/list  and search there for "doc-html.tar.gz"
baseapi_8cpp__incl.png
baseapi_8h__dep__incl.png
baseapi_8h__incl.png

Ray Smith

unread,
Mar 20, 2012, 12:30:59 AM3/20/12
to tesser...@googlegroups.com
Hi Zdenko,

Thanks for updating the documentation!

I suppose call graphs/include graphs would be very useful, but as you point out I think they are a bit big. The smaller 46MB one might work though.

I noticed a problem with the attached graphs though. Since baseapi.cpp includes tesseractclass.h, it *should* show the entire directory class hierarchy:
tesseractclass.h
wordclass.h
classify.h 
ccstruct.h
cutil_class.h
ccutil.h

and they aren't there below wordclass.h

F3 in eclipse works well for me though.

Ray.

zdenko podobny

unread,
Mar 22, 2012, 5:28:29 PM3/22/12
to tesser...@googlegroups.com
On Tue, Mar 20, 2012 at 5:30 AM, Ray Smith <thera...@gmail.com> wrote:
Hi Zdenko,

Thanks for updating the documentation!

I suppose call graphs/include graphs would be very useful, but as you point out I think they are a bit big. The smaller 46MB one might work though.

 
I noticed a problem with the attached graphs though. Since baseapi.cpp includes tesseractclass.h, it *should* show the entire directory class hierarchy:
tesseractclass.h
wordclass.h
classify.h 
ccstruct.h
cutil_class.h
ccutil.h

and they aren't there below wordclass.h

F3 in eclipse works well for me though.

Well I am not eclipse user but I tried it ;-). I found include browser I tried to go through tesseractclass.h hierarchy I did not find them (wordclass.h or ccutil.h). CFTL+F did not help too... 

So I turned back to find&grep:

$ find . -name "*.cpp" -o -name "*.h" -type f |  xargs grep 'wordclass.h'
./wordrec/bestfirst.cpp:#include "wordclass.h"
./wordrec/wordclass.cpp:#include "wordclass.h"
./wordrec/pieces.cpp:#include "wordclass.h"
./wordrec/chopper.cpp:#include "wordclass.h"
./wordrec/tface.cpp:#include "wordclass.h"
./wordrec/wordclass.h: * File:         wordclass.h

=> wordclass.h is included to cpp files only.

If I understood it correctly: doxygen is linking files via "#include"... So if there is no "include link" they are not showed in hierarchy.

Tom Powers

unread,
Mar 23, 2012, 4:46:15 AM3/23/12
to tesser...@googlegroups.com
On Sun, Mar 18, 2012 at 3:25 PM, zdenko podobny <zde...@gmail.com> wrote:

What I find a bit confusing, is that you can't just look at the docs
for baseapi.h [1] or baseapi.cpp [2] files to get documentation on all
the available methods. Instead, you have to know to look at the CLASS
TessBaseAPI [3]. Any pointers to this documentation should mention it
as the page to start with.

The Modules | Advanced API page [4] seems to include a number of
pretty basic methods? Is this doxygen group really necessary?

I also find it handy to have ALL the source files reachable via a "Go
to the source code of this file" link (since doxygen will hyperlink
stuff back to the documentation). I guess you do that for header files
but not .cpp files?

[1] http://zdenop.github.com/tesseract-doc/baseapi_8h.html

[2] http://zdenop.github.com/tesseract-doc/baseapi_8cpp.html

[3] http://zdenop.github.com/tesseract-doc/classtesseract_1_1_tess_base_a_p_i.html

[4] http://zdenop.github.com/tesseract-doc/group___advanced_a_p_i.html

          -- Tom

Ray Smith

unread,
Mar 23, 2012, 1:06:46 PM3/23/12
to tesser...@googlegroups.com
On Thu, Mar 22, 2012 at 2:28 PM, zdenko podobny <zde...@gmail.com> wrote:


On Tue, Mar 20, 2012 at 5:30 AM, Ray Smith <thera...@gmail.com> wrote:
Hi Zdenko,

Thanks for updating the documentation!

I suppose call graphs/include graphs would be very useful, but as you point out I think they are a bit big. The smaller 46MB one might work though.

I uploaded it to github:
 
I noticed a problem with the attached graphs though. Since baseapi.cpp includes tesseractclass.h, it *should* show the entire directory class hierarchy:
tesseractclass.h
wordclass.h
My typo. Should read wordrec.h
The hierarchy is missing though.

zdenko podobny

unread,
Mar 28, 2012, 2:17:45 PM3/28/12
to tesser...@googlegroups.com
On Fri, Mar 23, 2012 at 6:06 PM, Ray Smith <thera...@gmail.com> wrote:


On Thu, Mar 22, 2012 at 2:28 PM, zdenko podobny <zde...@gmail.com> wrote:


On Tue, Mar 20, 2012 at 5:30 AM, Ray Smith <thera...@gmail.com> wrote:
Hi Zdenko,

Thanks for updating the documentation!

I suppose call graphs/include graphs would be very useful, but as you point out I think they are a bit big. The smaller 46MB one might work though.

I uploaded it to github:
 
I noticed a problem with the attached graphs though. Since baseapi.cpp includes tesseractclass.h, it *should* show the entire directory class hierarchy:
tesseractclass.h
wordclass.h
My typo. Should read wordrec.h
The hierarchy is missing though.
classify.h 
ccstruct.h
cutil_class.h
ccutil.h

and they aren't there below wordclass.h

F3 in eclipse works well for me though.

You are right - not full hierarchy is shown. Here is info from doxygen doc[1]:

The elements in the graphs generated by the dot tool have the following meaning:

    • A white box indicates a class or struct or file.
    • A box with a red border indicates a node that has more arrows than are shown! In other words: the graph is truncated with respect to this node. The reason why a graph is sometimes truncated is to prevent images from becoming too large. For the graphs generated with dot doxygen tries to limit the width of the resulting image to 1024 pixels.
    • A black box indicates that the class' documentation is currently shown.
    • A dark blue arrow indicates an include relation (for the include dependency graph) or public inheritance (for the other graphs).
    • A dark green arrow indicates protected inheritance.
    • A dark red arrow indicates private inheritance.
    • A purple dashed arrow indicated a "usage" relation, the edge of the arrow is labeled with the variable(s) responsible for the relation. Class A uses class B, if class A has a member variable m of type C, where B is a subtype of C (e.g. C could be B, B*, T\<B\>*). 
Example graph legend is here[2].

Zdenko

zdenko podobny

unread,
Mar 28, 2012, 3:39:37 PM3/28/12
to tesser...@googlegroups.com
I also recognized some doxygen warning. See attachment for review.
warnings.txt

zdenko podobny

unread,
Mar 29, 2012, 1:03:00 PM3/29/12
to tesser...@googlegroups.com
Doxygen group were created by r441[*] based on comments presented in source code. They are not necessary, but IMO they can help to understand code if somebody (with good knowledge of code) will maintain them ;-)
 


I also find it handy to have ALL the source files reachable via a "Go
to the source code of this file" link (since doxygen will hyperlink
stuff back to the documentation). I guess you do that for header files
but not .cpp files?

I adjusted configuration (SOURCE_BROWSER = YES + INLINE_SOURCES = YES) and uploaded new version.

zdenko podobny

unread,
Mar 29, 2012, 1:11:24 PM3/29/12
to tesser...@googlegroups.com
I found reason why some graphs are truncated: there is limit for numbers of node per graph. The current configuration use limit 50 node/graph. I increased it to 500 and baseapi.h graph was not truncated. BUT there are 125 (!) nodes and graph is messy and not useful... bz2 package has 60M...

Pavel Mazniker

unread,
May 1, 2012, 7:15:59 AM5/1/12
to tesser...@googlegroups.com
Hi,

There is a list of variables ( >600 ) such as "image_default_resolution" , ... that could be setted by calling SetVariable(...) function on instance of  TessBaseApi

that could be printed in latest version using PrintVariables(...) function.

Is there any description of the variables ( documentation,manual ) , what does each variable mean, how it configures the OCR , etc.?

Thanks.

Paul

Ray Smith

unread,
May 13, 2012, 12:27:13 PM5/13/12
to tesser...@googlegroups.com
Well each one has its own short description in the comment string that goes with it.
If you aren't familiar enough with the code to understand that (end I expect only a few people will qualify for that) then no there is no documentation and as you point out there are >600 pieces of documentation to write so it would be a mammoth task.

troplin

unread,
May 15, 2012, 5:19:36 AM5/15/12
to tesser...@googlegroups.com
But maybe just the most important / most high-level variables from a users point of view? Or even just a list of those, without documentation?

Am Sonntag, 13. Mai 2012 18:27:13 UTC+2 schrieb Ray:
Well each one has its own short description in the comment string that goes with it.
If you aren't familiar enough with the code to understand that (end I expect only a few people will qualify for that) then no there is no documentation and as you point out there are >600 pieces of documentation to write so it would be a mammoth task.

Pavel Mazniker

unread,
May 18, 2012, 2:13:42 AM5/18/12
to tesser...@googlegroups.com
Hi

you wrote:

"
Well each one has its own short description in the comment string that goes with it.If you aren't familiar enough with the code to understand that (end I expect only a few people will qualify for that) then no there is no documentation and as you point out there are >600 pieces of documentation to write so it would be a mammoth task.
"

My answer:
Let me agree with you concerns necessity to understand the code well. Where is the the variables mentioned in the code what file / class ?
Thanks.

Tom Powers

unread,
May 18, 2012, 5:31:01 AM5/18/12
to tesser...@googlegroups.com
See [1] [2] to get a list of all configuration parameters. Should
probably be added to the FAQ.

[1] http://groups.google.com/group/tesseract-ocr/msg/73565d039201f2e6

[2] http://groups.google.com/group/tesseract-ocr/msg/97a1ecb0a454c8a3
Reply all
Reply to author
Forward
0 new messages