Beautiful Soup doesn't track this information, so you can't use
Beautiful Soup for what you want. The information is tracked by the
underlying sgmllib parser, so you can use sgmllib, and I might put
this feature in a future version of Beautiful Soup.
Leonard
> I think it will be a nice idea to do so as many developers will use BS
> to scrap information from web pages, and would like some reference
> back to where information originally came from.
I was unable to implement this because sgmllib doesn't actually update
its line number information as it parses. This is a known bug in
Python:
http://bugs.python.org/issue849097
That bug has a patch you can use on sgmllib, and I've attached a patch
that changes Beautiful Soup to store the getpos() information for all
PageElement objects. If you apply them both it should work, but I
won't put this in the official release.
Leonard
Please note our office is closed for the Christmas holidays from 22nd December 2007; normal office opening hours will be resumed from Wednesday 2nd January 2008.
We would like to wish you a very merry Christmas and a happy New Year on behalf of the Neutralize team!
Kind Regards,
John Glazebrook
_________________________________________
Neutralize (*\*)
Search Engine Marketing Services
T: +44 (0) 8700 630707
F: +44 (0) 8700 630708
E: jo...@neutralize.com
U: http://www.neutralize.com
International T: 00 44 1209 722340
International F: 00 44 1209 717263
_________________________________________
Members of the Search Marketing Association UK
http://www.sma-uk.org
The information transmitted is intended only for the person or entity to which it is addressed. This email is subject to the Terms and Conditions available at:
http://www.neutralize.com/emailterms.txt
_________________________________________
Head Office: 3 The Setons, Tolvaddon Energy Park, Cornwall, TR14 0HX
Registered Address: Nuera Limited trading as Neutralize, 70 Conduit Street,London W1S 2GF
Company Registration No. 3849708 - VAT Registration No. 743 9641 09
Neutralize & (*\*) are a registered TradeMarks of Nuera Limited.