You were talking about Canny, Sobel, etc. and these indeed relate to edge detection in its common sense (
http://en.wikipedia.org/wiki/Edge_detection). And in this sense Tesseract does not do any edge detection.
Yes, one might call the process of finding CC contours in binary image edge detection. But seeing it as conventional edge detection would make the task completely degenerate and thus using the approaches from conventional edge detection would be totally unreasonable, and some of them - unusable at all. Why there's a comment in the source code saying it's an "edge detector" although this notion has other common meaning? That should be addressed to developers. I suppose this is because internally they refereed to CC contours as "edges" and they used to call their method of contour extraction as "crack edges".
I would refrain from considering myself an authority in all that's related to naming and notions, though.
What you have shown in your image is not what is produced by extract_edges() or block_edges(). Those build completely different structures, similar to that is commonly known as crack coded CC boundaries.
On Saturday, June 23, 2012 2:47:04 PM UTC+4, shahin youssefi wrote:
Dmitri, you are correct, this function only set the bounding box of ,em, not exactly CCs.
if the character has a closed curve in it, the inner area is returned as an outline. for example [this].
I've shown the result of the "extract_edges" in green lines.
> tesseract-ocr+unsubscribe@googlegroups.com