Moving a conversation here since it may be of wider interest.
zdenko> Also there is request to get text information (STRING) from
zdenko> renderer. At the moment renderer produce output to file, but
zdenko> some users (e.g. those how use tesseract wrappers) would
zdenko> like to use this information (especially hocr, tsv and txt) for
zdenko> further processing.
zdenko> This request is related to API breakage between 3.02 and 3.04 .
zdenko> Problem is with functions ProcessPage and ProcessPages that put
zdenko> result as "STRING* text_out" in 3.02 and from 3.04 as
zdenko >"TessResultRenderer* renderer". I thinks it is important to fix this
zdenko> backward API compatibility ASAP.
For things like book scanning, it is very common to be working with many
high resolution images, that cannot fit in memory at the same time.
Remember that PDF output contains a copy of all the images. Therefore
it is important that PDF output uses a streaming interface, rather than writing
everything to a memory buffer.
However, I certainly understand that people like memory buffers especially
for small output formats like txt, hocr, etc. I hope the answer for that can be
fmemopen() rather than abandoning the streaming interface entirely.