Line numbers for the HTML being parsed by BeautifulSoup?

1,337 views
Skip to first unread message

Ted Sullivan

unread,
Apr 27, 2012, 9:22:56 AM4/27/12
to beautifulsoup
Is it possible to get the line number for the tag being processed? I
have a recursive method that goes through all HTML tags on a page to
make sure they have id and name attributes and that the name and id
match. When I find a tag that does not meet this standard, I'd like
to print out the tag name and the line number where the tag is located
in the HTML file. The column might be good too, but I'd settle for
the line number. I'd like the code to be something like this...

<code>

name_addr = current_tag.get(u'name')
id_addr = current_tag.get(u'id')
if name_addr is None or id_addr is None or name_addr != id_addr:
print "Tag <%s> on line %s of the HTML file does not have valid
attributes." %
(current_tag.name,
current_tag.line_number)

</code>

The "current_tag.line_number" functionality is what I'm looking for.
I have BeautifulSoup 3 and 4 installed on my system and I can't find
this functionality in either version.

Leonard Richardson

unread,
Apr 27, 2012, 9:33:15 AM4/27/12
to beauti...@googlegroups.com
All three parsers used by Beautiful Soup have some way of reading off
the current line number, but Beautiful Soup does not record this
information. That's something I could add, but for your project I
recommend making direct use of html.parser, html5lib, or lxml.

Leonard
> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>

Ted Sullivan

unread,
Apr 27, 2012, 2:42:58 PM4/27/12
to beautifulsoup
Thanks.

The HTMLParser class(es) did exactly what I needed them to do.
Reply all
Reply to author
Forward
0 new messages