Rotten tomatoes - doesn't display content/ratings in Parsehub browser to select/scrape data

210 views
Skip to first unread message

geejo fake

unread,
Sep 5, 2021, 10:34:07 PM9/5/21
to Web Scraping
Hi am having issues, trying to select tomatometer/audience scores from Rotten Tomatoes. 

The portion where it shows the movie title, year, genre, runtime, tomatometer & audience scores is blanked out in Parsehub browser. 

I tried this on both Linux/Windows versions and on Firefox/new MS Edge browsers. Attached screenshot here. 

Can you please fix it?


parsehub_4sMfDRJJC3.png

Andrew11

unread,
Sep 6, 2021, 5:41:39 PM9/6/21
to Web Scraping
Those are attributes of the "scoreboard" element. Using ParseHub selection choose the blank gray box where the data should be, then click on the Select command's download icon if any to expand the Extract command, or add one if it doesn't exist. Then select the extract command and look for the drop-down combo box where you can choose the attributes.

Andrew11

unread,
Sep 6, 2021, 5:47:44 PM9/6/21
to Web Scraping
...and to extract the movie title, nest a Select command inside the scoreboard select, change it to CSS selection mode, and paste in
.scoreboard__title
uncheck the rooted selection checkbox.

Here's the HTML for the scoreboard on that movie. Notice how the class attributes are what you look for:
<score-board audiencestate="upright" audiencescore="93" class="scoreboard" rating="" tomatometerstate="certified-fresh" tomatometerscore="99" data-qa="score-panel">
                        <h1 slot="title" class="scoreboard__title" data-qa="score-panel-movie-title">It Happened One Night</h1>
                        <p slot="info" class="scoreboard__info">1934, Romance, 1h 45m</p>
                        <a slot="critics-count" href="/m/it_happened_one_night/reviews?intcmp=rt-scorecard_tomatometer-reviews" class="scoreboard__link scoreboard__link--tomatometer" data-qa="tomatometer-review-count">97 Reviews</a>
                        <a slot="audience-count" href="/m/it_happened_one_night/reviews?type=user&amp;intcmp=rt-scorecard_audience-score-reviews" class="scoreboard__link scoreboard__link--audience" data-qa="audience-rating-count">25,000+ Ratings</a>
                        <div slot="sponsorship" id="tomatometer_sponsorship_ad" style=""><div id="div-gpt-tomatometer-7093448" class="mps-slot" data-mps-slot="tomatometer" data-mps-loadset="0" data-google-query-id="CKKcmJ2n6_ICFfkT-QAd2VcCag"><script>mps._execAd("tomatometer");</script><div id="google_ads_iframe_/2620/rottentomatoes/movie/movie_page_11__container__" style="border: 0pt none; width: 524px; height: 0px;"></div></div></div>
                    </score-board>

geejo fake

unread,
Sep 8, 2021, 12:03:52 AM9/8/21
to Web Scraping
A tutorial on this would be highly appreciated, since it was pretty easy to follow the Metacritic tutorial and I was able to do it in half an hour. Grabbing from Rotten Tomatoes looks kind of different.

Andrew11

unread,
Sep 8, 2021, 12:52:55 AM9/8/21
to Web Scraping
They probably won't want to show off that the scraper just has a blank gray box where the data should be! In the future if you turn on Browse mode with CTRL-B, you can right-click things on the page and inspect the HTML to see if it has anything invisible.

sh...@parsehub.com

unread,
Sep 8, 2021, 11:20:57 AM9/8/21
to Web Scraping
Hi,

Unfortunately it appears that this website is no longer compatible with the ParseHub client. Those grey boxes are supposed to be filled with content using some code that is not longer working with the version of Firefox that ParseHub is running on. This will not be resolved until we upgrade the client's browser, which is still a few months away. I am sorry for the inconvenience.

For now, it does appear that the data is still present in the HTML as Andrew has pointed out. You can target this using CSS selections in ParseHub. Here are some examples:

This image shows the HTML Andrew has mentioned:
one.png

You can see the highlighted HTML elements contain data you are looking for. You can target and extract the title by using its 'class' attribute: .scoreboard__title 

In ParseHub you can modify a selection to use CSS and enter the class I mentioned above. See image:
two.png

The info can be selected with: .scoreboard__info and the reviews with: .scoreboard__link--tomatometer

I hope this helps!
Cheers,
Shan
Reply all
Reply to author
Forward
0 new messages