Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Python3 html scraper that supports javascript

33 views
Skip to first unread message

zlju...@gmail.com

unread,
May 1, 2016, 10:19:42 AM5/1/16
to
Hi,

can you please recommend to me a python3 library that I can use for scrapping JS that works on windows as well as linux?

Regards.

Bob Gailer

unread,
May 1, 2016, 1:01:30 PM5/1/16
to
On May 1, 2016 10:20 AM, <zlju...@gmail.com> wrote:
>
> Hi,
>
> can you please recommend to me a python3 library that I can use for
scrapping JS
I'm not sure what you mean by that. The tool I use is Splinter. Install it
using pip.

zlju...@gmail.com

unread,
May 2, 2016, 11:34:05 AM5/2/16
to


I tried to use the following code:

from bs4 import BeautifulSoup
from selenium import webdriver

PHANTOMJS_PATH = 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe'

url = 'https://hrti.hrt.hr/#/video/show/2203605/trebizat-prica-o-jednoj-vodi-i-jednom-narodu-dokumentarni-film'

browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get(url)

soup = BeautifulSoup(browser.page_source, "html.parser")

x = soup.prettify()

print(x)


When I print x variable, I would expect to see something like this:
<video src="mediasource:https://hrti.hrt.hr/2e9e9c45-aa23-4d08-9055-cd2d7f2c4d58" id="vjs_video_3_html5_api" class="vjs-tech" preload="none"><source type="application/x-mpegURL" src="https://prd-hrt.spectar.tv/player/get_smil/id/2203605/video_id/2203605/token/Cny6ga5VEQSJ2uZaD2G8pg/token_expiration/1462043309/asset_type/Movie/playlist_template/nginx/channel_name/trebiat__pria_o_jednoj_vodi_i_jednom_narodu_dokumentarni_film/playlist.m3u8?foo=bar">
</video>

but I can't come to that point.

Regards.

DFS

unread,
May 2, 2016, 12:39:49 PM5/2/16
to
I was doing something similar recently. Try this:

f = open(somefilename)
soup = BeautifulSoup.BeautifulSoup(f)
f.close()
print soup.prettify()


Stephen Hansen

unread,
May 2, 2016, 2:00:42 PM5/2/16
to
Why? As important as it is to show code, you need to show what actually
happens and what error message is produced.

--
Stephen Hansen
m e @ i x o k a i . i o

zlju...@gmail.com

unread,
May 2, 2016, 4:11:49 PM5/2/16
to

> Why? As important as it is to show code, you need to show what actually
> happens and what error message is produced.

If you run the code you will see that html that I got doesn't have link to the flash video. I should somehow do something (press play video button maybe) in order to get html with reference to the video file on this page.

Regards
0 new messages