when rendering the final mixed-mode PDF output files.
# the convert command (part of Imagemagick) creates a clean lossless compressed image 1.png
# if you already have a png with characters and digits in it, you do not need the following command:
convert -density 300x300 -depth 8 1.pdf 1.png
# the Tesseract is called and creates a mixed mode pdf with filename "1.png.pdf"
# this output shows coding artefacts between the characters and digits if you enlarge the view
# I can supply you with images (on request)
tesseract -l eng 1.png 1.png pdf
--To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5b80105f-8db1-42bb-bf2d-3806ea0c052f%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
Remark: the 1.png file was too big (in resolution) to be uploaded here directly, so you will find the 1.png file (input) only inside the zip file.Let me know, if you need more, and please confirm, that you can see the compression artefacts in the 1.png.pdf (zoom to 400% !!), between the characters.
When using a lossy compression, I am pretty sure, that 1.png.pdf will be the same quality as the input. This is the goal to be solved within the scope of this bug issue.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b782e544-4fab-4248-9790-d71978a59c4f%40googlegroups.com.
I do not have to time to have a look on this issue yet, but forcing user to use lossless compression is not right way IMO.Right way is to implement option for user to force tesseract to use lossless compression, but this feature is not provided by your "patch"...
Am Donnerstag, 31. Juli 2014 23:14:30 UTC+2 schrieb zdenop:I do not have to time to have a look on this issue yet, but forcing user to use lossless compression is not right way IMO.Right way is to implement option for user to force tesseract to use lossless compression, but this feature is not provided by your "patch"...@zdenop
@jimregan
Dear zdenop, dear Jim
yes, thanks. I was thinking about an option --force-lossless-compression , but after having inspected the http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html documentation manual page, I think, that Tesseract does not support (apart from a few) command line options, Instead, it (mainly) supports to have options in a config file.
So I will modify my code so that lossless compression can be forced by enabling it by means of a switch in the config file.
Question 1
========
Please can you let me know, if you like my approach (config parameter), or if you would also support my proposal for a command line switch (--force-lossless-compression).
In general:
- regarding issue tracker - add there patch. Do not post there code or link to code change.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2aed2896-70c2-4a91-bca4-e3d835f7745a%40googlegroups.com.
for the solution (patch and discussion about the smaller filesize. Lossless compression does not introduce coding artefacts when rendering PDF output files.)