Get position/line number - Implemented?

1,819 views
Skip to first unread message

Kenny Meyer

unread,
May 1, 2010, 9:50:54 PM5/1/10
to beauti...@googlegroups.com, leonard.r...@gmail.com
I was wondering if BeautifulSoup can get the line number of the things that are
going to be parsed out the HTML document?

Apparently, the official documentation doesn't offer any great information,
about this issue.
I'm aware that this topic has been handled at least twice in the past AFAIK
[1][2], but that was **some** time ago, that's why I "re-opened" the question.

I am asking, because in the mailing-list threads mentioned earlier, there *are*
concrete patches, available. I wonder what may happened with those? If this is
issue is still not handled I'll may open an enhancement request on Launchpad.

Actually, I found something interesting here:

http://api.plone.org/Plone/3.0/private/src/kss.core/kss/core/private/kss.core.BeautifulSoup.BeautifulStoneSoup-class.html

or more specifically, here:

http://api.plone.org/Plone/3.0/private/src/kss.core/kss/core/private/markupbase.ParserBase-class.html#getpos

[1] http://groups.google.com/group/beautifulsoup/browse_thread/thread/58fc89c6d5ae6b84/218edc54598f8609?lnk=gst&q=line+number#218edc54598f8609
[2] http://groups.google.com/group/beautifulsoup/browse_thread/thread/2b85fcede4814982/e815185482bd65fa?lnk=gst&q=line+number#e815185482bd65fa
--
Kenny Meyer | http://kenny.alwaysdata.net

--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.

Aaron DeVore

unread,
May 2, 2010, 6:08:36 PM5/2/10
to beauti...@googlegroups.com
Thank you for bringing this up. Leonard and I are currently discussing
how, not if, to add line/column numbers. We need to iron out details
like the API, implementation, and tests. Many thanks to Greg Baker,
who wrote the email/patch you linked to. The final implementation will
probably be based on his code.

Cheers!
Aaron DeVore

Kenny Meyer

unread,
May 2, 2010, 6:25:19 PM5/2/10
to beauti...@googlegroups.com
That's great news!

The real reason of why I'd like to see this in a new release of BeautifulSoup
is that I'm using BeautifulSoup to parse out translatable strings out of HTML
to the gettext format. Would be nice to later find those strings again. :-)

--
Kenny Meyer | http://kenny.alwaysdata.net

sste...@gmail.com

unread,
May 2, 2010, 8:18:36 PM5/2/10
to beauti...@googlegroups.com

On May 2, 2010, at 6:25 PM, Kenny Meyer wrote:

> Aaron DeVore (aaron....@gmail.com) wrote:
>> Thank you for bringing this up. Leonard and I are currently discussing
>> how, not if, to add line/column numbers. We need to iron out details
>> like the API, implementation, and tests. Many thanks to Greg Baker,
>> who wrote the email/patch you linked to. The final implementation will
>> probably be based on his code.
>>
>> Cheers!
>> Aaron DeVore
>>
>> --
>> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
>> To post to this group, send email to beauti...@googlegroups.com.
>> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>>
>
> That's great news!
>
> The real reason of why I'd like to see this in a new release of BeautifulSoup
> is that I'm using BeautifulSoup to parse out translatable strings out of HTML
> to the gettext format. Would be nice to later find those strings again. :-)

I actually have a use case for this as well; modifying the contents of various tags where I can't necessarily parse & reconstruct the entire document (ASPX, for example).

If I new what line I found stuff on, I could do the replace in the document without worrying about disturbing the rest of it.

S

Martin Wildeis

unread,
Apr 25, 2014, 10:50:43 AM4/25/14
to beauti...@googlegroups.com

Are line numbers now implemented?

Regards

Ben Davis

unread,
Oct 26, 2014, 4:32:14 PM10/26/14
to beauti...@googlegroups.com
Hi, I know it's been a few years -- has this been implemented, or is there a workaround?

Richard

unread,
Mar 10, 2015, 7:16:46 AM3/10/15
to beauti...@googlegroups.com
Just checking in to see if anything's happened with line numbers.

It'd be great!

leonardr

unread,
Jun 27, 2015, 10:42:32 AM6/27/15
to beauti...@googlegroups.com, rba...@umn.edu
I realize a lot of people want this, but it's a very difficult feature to implement, especially across parser backends, and you won't be seeing it in Beautiful Soup unless someone sends me a patch.

Leonard

Waylan Limberg

unread,
Jul 6, 2015, 2:55:48 PM7/6/15
to beauti...@googlegroups.com, rba...@umn.edu

On Saturday, June 27, 2015 at 10:42:32 AM UTC-4, leonardr wrote:
I realize a lot of people want this, but it's a very difficult feature to implement, especially across parser backends, and you won't be seeing it in Beautiful Soup unless someone sends me a patch.


Just stumbled on this and I have to agree. My research on retaining position info from HTML parsers is [here] (not specifically related to Beautiful Soup, but covers all of the supported backends). As far as I can tell, it would be impossible to get consistent behavior across parsers and some parsers will probably never work. I suspect the only successful implementation will only work for the "html.parser" backend. Would you be interested in adding support if it was only for a single backend? If not, I wouldn't expect this feature to ever be available.

Waylan

Leonard Richardson

unread,
Jul 6, 2015, 2:58:13 PM7/6/15
to beauti...@googlegroups.com
I'd be okay with an implementation that only worked with the html.parser backend.
 
Leonard
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beautifulsou...@googlegroups.com.
To post to this group, send email to beauti...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
 
Reply all
Reply to author
Forward
0 new messages