[reportlab-users] Python files and ascii

Claude Paroz

unread,

Feb 18, 2022, 5:45:43 PM2/18/22

to For users of Reportlab open source software

Hi all,

Here's a new patch that stop testing for ReportLab Python files being
ASCII-only. On Python 3, we can safely include Unicode chars in Python
files.

Claude
--
www.2xlibre.net

0001-Stop-enforcing-ascii-only-python-files.patch

Robin Becker

unread,

Feb 21, 2022, 5:51:23 AM2/21/22

to reportlab-users, Claude Paroz

Hi Claude,

I am a bit unsure about this patch. I accept the unique test is probably not required, but I don't think we should
remove the test for all reportlab python files being in ascii.

There has been a lot of interest recently in the possibility of using unicode to do malware hackery eg by smuggling in
code which appears reasonable, but is in fact different and hidden by use of homoglyphs see eg

https://threatpost.com/trojan-source-invisible-bugs-source-code/175891/

The ReportLab code base has at least until now been almost entirely in English with some American spellings eg color
instead of colour, and there are a small number of foreign language texts (mostly in the tests folder).

I suppose the implication of removing the test would be that some of the reportlab code could use variables strings etc
with non-ascii characters. Can you give examples where that would be beneficial.

What do others think?

I'm not entirely sure about the security problems with homoglyphs, but they have to be a consideration with open source
projects where we have a fairly open patching policy.

On 18/02/2022 22:45, Claude Paroz wrote:
> Hi all,
>
> Here's a new patch that stop testing for ReportLab Python files being ASCII-only. On Python 3, we can safely include
> Unicode chars in Python files.
>
> Claude

..........
--
Robin Becker
_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
https://pairlist2.pair.net/mailman/listinfo/reportlab-users

Peter Cock via reportlab-users

unread,

Feb 21, 2022, 6:02:23 AM2/21/22

to reportlab-users, Peter Cock

That's a good point Robin, something I'll keep in mind for other Python projects.

Generally it has only been people's names where I/we have needed this (things

like contributor listings or references in comments/docstrings).

Would it be a helpful compromise to allow unicode Python files for the tests only?

Peter

Claude Paroz

unread,

Feb 21, 2022, 6:28:05 AM2/21/22

to reportlab-users

Hi Robin, Peter,

Thanks for the threatpost link. Indeed a good think to be aware of. I'ts
just a bit sad that once again, hackers are winning in a sense :-(

I like the proposal of Peter, alowing unicode for test files. Because
Unicode is rather useful in tests. Typically, it's just more friendly to
write:

para = Paragraph("Viele Grüße")

than

para = Paragraph(b'Viele Gr\xc3\xbc\xc3\x9fe'.decode('utf-8'))

Claude

Le 21.02.22 à 12:02, Peter Cock a écrit :

Robin Becker

unread,

Feb 21, 2022, 7:24:55 AM2/21/22

to Claude Paroz, For users of Reportlab open source software

Claude,

I agree that Peter's suggestion is probably a good way forward. It would keep homoglyph stuff away from the isnatllation
source.

Let's give it a few days in case others want to have a say. Perhaps there are experts who know what python is doing
about it. I saw a lot about editors etc etc in the python-dev list discussion.

As it stands the existing test uses this for the search folder reportlab.lib.testutils.RL_HOME and that is None by
default, but our tests use setOutDir which calculates it as

> import reportlab
> RL_HOME=reportlab.__path__[0]
> if not os.path.isabs(RL_HOME): RL_HOME=os.path.normpath(os.path.abspath(RL_HOME))

so currently I think the tests folder may be outside the test for ascii even if import reportlab resolves to

tests/../src/reportlab

As for this test; the initial code was checked in by Dinu Gherman in 2001;

https://hg.reportlab.com/hg-public/reportlab/rev/2eb7bc711395

the message wasn't very informative 'Initial checkin.'. It might just have been a method for him to prevent usage of
umlauts, which might well have caused problems in those days. Certainly python was not using unicode at that time.

On 21/02/2022 11:27, Claude Paroz wrote:
> Hi Robin, Peter,
>
> Thanks for the threatpost link. Indeed a good think to be aware of. I'ts just a bit sad that once again, hackers are
> winning in a sense :-(
>
> I like the proposal of Peter, alowing unicode for test files. Because Unicode is rather useful in tests. Typically, it's
> just more friendly to write:
>
> para = Paragraph("Viele Grüße")
>
> than
>
> para = Paragraph(b'Viele Gr\xc3\xbc\xc3\x9fe'.decode('utf-8'))
>
> Claude
>
> Le 21.02.22 à 12:02, Peter Cock a écrit :
>> That's a good point Robin, something I'll keep in mind for other Python projects.
>> Generally it has only been people's names where I/we have needed this (things
>> like contributor listings or references in comments/docstrings).
>>
>> Would it be a helpful compromise to allow unicode Python files for the tests only?
>>
>> Peter
>>
>> On Mon, Feb 21, 2022 at 10:51 AM Robin Becker <ro...@reportlab.com <mailto:ro...@reportlab.com>> wrote:
>>
>> Hi Claude,
>>
>> I am a bit unsure about this patch. I accept the unique test is

..........
>>
>> What do others think?

ci...@online.de

unread,

Feb 21, 2022, 7:28:16 AM2/21/22

to reportl...@lists2.reportlab.com

On 21.02.2022 12:27, Claude Paroz wrote:
> I like the proposal of Peter, alowing unicode for test files. Because
> Unicode is rather useful in tests. Typically, it's just more friendly to
> write:
>
> para = Paragraph("Viele Grüße")
>
> than
>
> para = Paragraph(b'Viele Gr\xc3\xbc\xc3\x9fe'.decode('utf-8'))

Btw, that has been possible since Python 2.3 with a "# coding = " magic
comment. What's new is that Python 3 assumes UTF-8 by default instead of
ASCII and that you can use Unicode even in variable names.

-- Christoph

Tim Roberts

unread,

Feb 22, 2022, 4:09:42 AM2/22/22

to reportl...@lists2.reportlab.com

ci...@online.de wrote:
>
> Btw, that has been possible since Python 2.3 with a "# coding = "
> magic comment. What's new is that Python 3 assumes UTF-8 by default
> instead of ASCII and that you can use Unicode even in variable names.

That reminds me of the attached, which is a perfectly valid C++ program.

--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

UniCodeCode.jpg

Robin Becker

unread,

Feb 22, 2022, 4:16:57 AM2/22/22

to reportlab-users

On 21/02/2022 17:32, Tim Roberts wrote:
> ci...@online.de wrote:
>>
>> Btw, that has been possible since Python 2.3 with a "# coding = " magic comment. What's new is that Python 3 assumes
>> UTF-8 by default instead of ASCII and that you can use Unicode even in variable names.
>
> That reminds me of the attached, which is a perfectly valid C++ program.
>

Tim's email got stuck in the list and as the jpg is quite large (and possibly paranoia inducing) I copied it here

https://www.reportlab.com/ftp/UniCodeCode.jpg

--
Robin Becker

ci...@online.de

unread,

Feb 22, 2022, 4:28:53 AM2/22/22

to reportl...@lists2.reportlab.com

On 22.02.2022 10:16, Robin Becker wrote:
> Tim's email got stuck in the list and as the jpg is quite large (and
> possibly paranoia inducing) I copied it here
>
> https://www.reportlab.com/ftp/UniCodeCode.jpg

Could be the source code of either the game "funny fruit splash" or
"global thermonuclear war". You can only find out by running it. :)

-- Christoph

Reply all

Reply to author

Forward