Beautiful Soup 4.5.3-1 (Python 3) - refusing to return the first div in a 'div with attribute' CSS select?

47 views
Skip to first unread message

omeg...@gmail.com

unread,
Feb 26, 2017, 10:12:40 AM2/26/17
to beautifulsoup
I have attached a tweet page I'm trying to parse - I fetch all tweets
via 'div.original-tweet', discard examples of 'div.pinned', then try to
detect retweets via 'div[data-retweeter]' (so a div with attribute
'data-retweeter') - however that select never returns a result, and I
have fallen back to detecting '.Icon--retweeted' instead.

I have noticed other times where the first div in a list of divs was not
being returned by a select, but this is the first time for me to look
into it properly.

Here is a cut down version of the code:

import io

import bs4

html_data
= io.open('tweet-page').read()
parsed
= bs4.BeautifulSoup(html_data, 'lxml')
parsed
.select('div.original-tweet')[1].select_one('div[data-retweeter]')


From 'parsed.select('div.original-tweet')[1]' you can see that the very
first element is a div with attribute 'data-retweeter', but it is not
returned.

Any ideas?

Thanks


tweet-page

omeg...@gmail.com

unread,
Jun 19, 2017, 12:53:35 PM6/19/17
to beautifulsoup
Reply all
Reply to author
Forward
0 new messages