Hello,
I've encountered a strange behavior and cannot figure out how to fix it.
If I use 'lxml' or 'html.parser' parser, the following code:
tags = ["tc:elem1", "tc:.*"]
soup = BeautifulSoup("""<tc:root xmlns:tc="http://myfaces.apache.org/tobago/component">
<tc:elem1 label="{label.test$string}" />
<tc:elem1 label="{blabla.test$string}" />
</tc:root>""", "html.parser")
print "without regex"
for tag in tags:
for el in soup.findAll(name=tag):
print el.name
print "with regex"
for tag in tags:
for el in soup.findAll(name=re.compile(tag)):
print el.name
prints out:
without regex
with regex
tc:elem1
tc:elem1
tc:root
tc:elem1
tc:elem1
If I use 'xml" parser like that:
tags = ["tc:elem1", "tc:.*"]
soup = BeautifulSoup("""<tc:root xmlns:tc="http://myfaces.apache.org/tobago/component">
<tc:elem1 label="{label.test$string}" />
<tc:elem1 label="{blabla.test$string}" />
</tc:root>""", "xml")
print "without regex"
for tag in tags:
for el in soup.findAll(name=tag):
print el.name
print "with regex"
for tag in tags:
for el in soup.findAll(name=re.compile(tag)):
print el.name
prints out:
without regex
elem1
elem1
with regex
I really want to use the 'xml' parser due to other advantages. But it does not seem to work with regex at all.
Plus, the 'lxml' and 'html.parser' behave really strange without the regex
Please, help