On Thu, May 23, 2013 at 8:58 AM, David Hughes <
d...@forestfield.co.uk> wrote:
>> Is this to do with the way PythonReports defines/uses the fonts?
>>
>> Any hint on how to make this display correctly?
> But, using the Wing
> debugger, the unicode strings in the PDF show the e-acute as \u2020 - which
> is indeed the Unicode character 'Dagger'
if you're using Wing, then this is the string after it was decoded
into a python unicode, object, yes?
In which case, the wrong encoding is being used to decode it.
So the question is, how are string encoded in PDF. From reading this thread:
http://stackoverflow.com/questions/128162/unicode-in-pdf
That's a hard question to answer, but presumable it is either:
All PDF text is encoded with a particular encoding
or
There is a way to specify the encoding in a particular document.
I suspect it's the latter, or you would have this problem all the
time. It could also be that PythonReports is using the wrong encoding
or specifying it incorrectly but as Adobe Reader is the reference
implementation, to some extent, if it works in Reader, it's right.
So you need to figure out how reader determines the encoding, and
emulate that. Maybe the specs will help:
http://www.adobe.com/devnet/pdf/pdf_reference.html
> So, I think the viewer is behaving correctly as far as it goes -
not really -- it's using the wrong encoding to decode the data in the
PDF -- that is not correct ( as long as you define correct as "same as
Adobe Reader" )
I'd make a tiny pdf with just a bit of non-ascii text in it, and take
a look at it. That may be easier than reading the spec!
-Chris