webp2y XML helper sanitize line breaks under python3.6

37 views
Skip to first unread message

Clemens

unread,
Feb 12, 2020, 2:37:32 AM2/12/20
to web2py-users
Hello!

In my web2py app I’m processing a list of items, where the user can click on a link for each item to select this. An item has an UUID, a title and a description. For a better orientation the item description is also displayed as link title. To prevent injections by and to escape tags in the description I’m using the XML sanitizer as follows:

A(this_item.title, \
  callback = URL('item', 'select', \
                 vars=dict(uuid=this_item.uuid), user_signature=True), \
  _title=XML(str_replace(this_item.description, {'\r\n':'&#13;', '<':'&#60;', '>':'&#62;'}), sanitize=True))


Using Python 2.7 everything was fine. Since I have switched to Python 3.6 I have the following problem. When the description contains line breaks the sanitizer is not working anymore. For example the following string produces by my str_replace routine is fine to be sanitized by the XML helper under Python 2.7 but not under Python 3.6:

Header&#13;&#13;Line1&#13;Line2&#13;Line3

Sanitizing line breaks escaped by &#13; is the problem with Python 3 (but not with Python 2). Everything else is no problem for the XML helper to sanitize (e.g. less than or greater than, I need these, since if there is no description it is generated as <no description>).

How can be line breaks sanitized by the XML helper running web2py under Python3?

Thanks for any support!

Best regards Clemens
&#13;

Christian Varas

unread,
Feb 12, 2020, 6:08:17 AM2/12/20
to web...@googlegroups.com
I had an issue with line breaks too, I remove lie breaks like this with python 3.7

some_string = some_string.replace(“\n”, ””).replace(“\r”, ””)

XML(some_string, sanitize=True)

Cheers
Chris

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/web2py/319d22e0-d1be-452c-8c25-d1ec76df1a5e%40googlegroups.com.

Clemens

unread,
Feb 12, 2020, 8:07:10 AM2/12/20
to web2py-users
Hello Chris,

thanks for your answer! But just kicking out all line breaks is a little harsh, since in my case the description is mostly a few lines long with 2 or 3 paragraphs. And I had the problem already solved by this procedure and the call as described in my question:

def str_replace(string, replacement_dict):
    if not isinstance(string, str):
        string = str(string)
    pattern = re.compile('|'.join([re.escape(k) for k in list(replacement_dict.keys())]), re.M)
    return pattern.sub(lambda x: replacement_dict[x.group(0)], string)

And this solution worked very well with python 2.7, having even line breaks in link titles. Then I moved to python 3.6 and the problem was there. Thus, I think, that the XML sanitizer under Python 3.6 is the problem, since it can't handle &#13;

Do you have any other ideas?

Best regards
Clemens


On Wednesday, February 12, 2020 at 12:08:17 PM UTC+1, Christian Varas wrote:
I had an issue with line breaks too, I remove lie breaks like this with python 3.7

some_string = some_string.replace(“\n”, ””).replace(“\r”, ””)

XML(some_string, sanitize=True)

Cheers
Chris

El El mié, 12 de feb. de 2020 a la(s) 04:37, Clemens <clemens...@claret-clover.de> escribió:
Hello!

In my web2py app I’m processing a list of items, where the user can click on a link for each item to select this. An item has an UUID, a title and a description. For a better orientation the item description is also displayed as link title. To prevent injections by and to escape tags in the description I’m using the XML sanitizer as follows:

A(this_item.title, \
  callback = URL('item', 'select', \
                 vars=dict(uuid=this_item.uuid), user_signature=True), \
  _title=XML(str_replace(this_item.description, {'\r\n':'&#13;', '<':'&#60;', '>':'&#62;'}), sanitize=True))


Using Python 2.7 everything was fine. Since I have switched to Python 3.6 I have the following problem. When the description contains line breaks the sanitizer is not working anymore. For example the following string produces by my str_replace routine is fine to be sanitized by the XML helper under Python 2.7 but not under Python 3.6:

Header&#13;&#13;Line1&#13;Line2&#13;Line3

Sanitizing line breaks escaped by &#13; is the problem with Python 3 (but not with Python 2). Everything else is no problem for the XML helper to sanitize (e.g. less than or greater than, I need these, since if there is no description it is generated as <no description>).

How can be line breaks sanitized by the XML helper running web2py under Python3?

Thanks for any support!

Best regards Clemens
&#13;

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web...@googlegroups.com.

Christian Varas

unread,
Feb 12, 2020, 10:42:53 AM2/12/20
to web...@googlegroups.com
Hi Clemens, 

Replace can handle big text it does not matter if is 1 - 1000 lines or more, It will replace all the occurrences in the text, also is faster. chaining "replace" is more faster than other methods.

description = his_item.description.replace("\n","&#13;").replace("\r","&#13;").replace("<","&#60;").replace(">","&#62;")
XML(description, sanitize=True)

or in one line

XML(his_item.description.replace("\n","&#13;").replace("\r","&#13;").replace("<","&#60;").replace(">","&#62;"), sanitize=True)


A(this_item.title, \
  callback = URL('item', 'select', \
                 vars=dict(uuid=this_item.uuid), user_signature=True), \
  _title=XML(his_item.description.replace("\n","&#13;").replace("\r","&#13;").replace("<","&#60;").replace(">","&#62;"), sanitize=True)

I had this issue with line breaks and XML helper also, the input containing line breaks was breaking my view, and replacing the bad characters before pass it to the helper fixed my problem.

Try in a console with a custom text and see the results.

Hope this helps
Cheers.
Chris.

To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/web2py/64244a11-0964-4e44-9b75-e9d9e8d33f83%40googlegroups.com.

Clemens

unread,
Feb 12, 2020, 11:17:31 AM2/12/20
to web2py-users
Hi Chris,

thanks a lot for your help! But the problem still exists even replacing my str_replace routine by str.replace() as proposed by you. Yes, I had the same problem with line breaks crashing the view. And replacing the line breaks by &#13; fixed it. But switching form python 2.7 to 3.6 raises the new problem that the sanitizer can't process &#13;-coded line breaks. Without sanitize=True (i.e. False by default) it also works with python 3.6. But sanitize=True doesn't work for &#13;-coded line breaks under python 3.6. And this is the case only for line breaks, all other special character are no problem.

I really think, that the XML sanitizer under python 3.6 is the problem. Do you have an idea for a work around except of eliminating all line breaks, cause I can't do this?

Best regards
Clemens


On Wednesday, February 12, 2020 at 4:42:53 PM UTC+1, Christian Varas wrote:
Hi Clemens, 

Replace can handle big text it does not matter if is 1 - 1000 lines or more, It will replace all the occurrences in the text, also is faster. chaining "replace" is more faster than other methods.

description = his_item.description.replace("\n","&#13;").replace("\r","&#13;").replace("<","&#60;").replace(">","&#62;")
XML(description, sanitize=True)

or in one line

XML(his_item.description.replace("\n","&#13;").replace("\r","&#13;").replace("<","&#60;").replace(">","&#62;"), sanitize=True)


A(this_item.title, \
  callback = URL('item', 'select', \
                 vars=dict(uuid=this_item.uuid), user_signature=True), \
  _title=
XML(his_item.description.replace("\n","&#13;").replace("\r","&#13;").replace("<","&#60;").replace(">","&#62;"), sanitize=True)

I had this issue with line breaks and XML helper also, the input containing line breaks was breaking my view, and replacing the bad characters before pass it to the helper fixed my problem.

Try in a console with a custom text and see the results.

Hope this helps
Cheers.
Chris.

Clemens

unread,
Feb 13, 2020, 7:22:41 AM2/13/20
to web2py-users
Tim Nyborg has got the solution:
It's a bug in yatl/sanitizer.py, which can be fixed as described:
https://stackoverflow.com/questions/60176267/webp2y-xml-helper-sanitize-line-breaks-under-python3

Thanks Tim!

Christian Varas

unread,
Feb 13, 2020, 4:27:49 PM2/13/20
to web...@googlegroups.com
Great!

To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/web2py/bc1aca0d-6b82-47d5-b1b2-0307ba886340%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages