How to handle "&" ?

33 views
Skip to first unread message

Heck Lennon

unread,
Sep 9, 2022, 10:23:15 AM9/9/22
to beautifulsoup
Hello,

I'm in a conundrum working with GPX (XML) files.

I use the following to prevent BS from turning < and > into &lt; and &gt; respectively:
==================
with open(OUTPUTFILE, "w") as file:
    file.write(soup.prettify(formatter=None))
==================

But as a result, & in URLs are kept as-is… which the Android map application doesn't like, and neither does Tidy:
Warning: unescaped & or unknown entity "&mlon"
Warning: unescaped & or unknown entity "&layers"

OTOH, if call prettify() with no parameters, the & will be turned into &amp; … which the application likes, but Style is no longer valid:
==================
with open(OUTPUTFILE, "w") as file:
    file.write(soup.prettify())
==================

&lt;Style&gt;&lt;LineStyle&gt;&lt;color&gt;FF0000FF&lt;/color&gt;&lt;width&gt;6&lt;/width&gt;&lt;/LineStyle&gt;&lt;/Style&gt;

What should I do?

Thank you.

leonardr

unread,
Sep 9, 2022, 12:37:37 PM9/9/22
to beautifulsoup
Can you share the whole GPX document? It looks like lxml is parsing your <Style> tag as a string, not as a tag. I would guess that is the root of the problem, and if that can be fixed then the default behavior will do what you need.

Leonard

Heck Lennon

unread,
Sep 9, 2022, 1:01:56 PM9/9/22
to beautifulsoup
Sorry, I forgo to show how I use the string: 

LS = "<Style><LineStyle><color>FF0000FF</color><width>6</width></LineStyle></Style>"

for ls in soup.find_all("LineString"):
    ls.insert_before(LS)

I guess inserting more than one tag at one go requires some exra code, so BS doesn't turn into a plain string.

leonardr

unread,
Sep 9, 2022, 1:51:29 PM9/9/22
to beautifulsoup
Yes, that's what's going on -- it's being inserted as a string, and then the angle brackets in the string are being encoded on the way out.

I recommend building that Style tag inside the for loop. The simplest way to do that is to parse that string into a second BeautifulSoup object. I say do this inside the for loop, because if you reuse a Style tag, it'll just migrate throughout the document and end up in the last place you put it.

Leonard

Heck Lennon

unread,
Sep 9, 2022, 2:11:46 PM9/9/22
to beautifulsoup
Bingo:

=============
if soup.find("LineString"): #track(s)
    print("Track(s) found")

    for ls in soup.find_all("LineString"):
        LS_soup = BeautifulSoup("<Style><LineStyle><color>FF0000FF</color><width>6</width></LineStyle></Style>", builder=builder,features='xml')
        ls.insert_before(LS_soup)
=============

Thank you.
Reply all
Reply to author
Forward
0 new messages