I have attached a tweet page I'm trying to parse - I fetch all tweets
via 'div.original-tweet', discard examples of 'div.pinned', then try to
detect retweets via 'div[data-retweeter]' (so a div with attribute
'data-retweeter') - however that select never returns a result, and I
have fallen back to detecting '.Icon--retweeted' instead.
I have noticed other times where the first div in a list of divs was not
being returned by a select, but this is the first time for me to look
into it properly.
Here is a cut down version of the code:
import io
import bs4
html_data = io.open('tweet-page').read()
parsed = bs4.BeautifulSoup(html_data, 'lxml')
parsed.select('div.original-tweet')[1].select_one('div[data-retweeter]')
From 'parsed.select('div.original-tweet')[1]' you can see that the very
first element is a div with attribute 'data-retweeter', but it is not
returned.
Any ideas?
Thanks