Trouble getting wget to mirror some sites like weworkmeteor.com etc ?

24 views
Skip to first unread message

Gurjit Singh

unread,
Apr 15, 2017, 3:45:32 AM4/15/17
to Belfast Linux User Group
Hello all, 

I am trying to use wget to mirror websites. I want a behaviour similar to ctrl + s to save a complete webpage on browser .Some sites are still causing the issues like weworkmeteor.com, www.danielmayor.com.

The options I am using are as follows
wget --no-check-certificate -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0' -E -H -k -KN -p -np -l 1 -e robots=off www.danielmayor.com

Any help is really appreciated

Jonny McCullagh

unread,
Apr 15, 2017, 5:25:44 PM4/15/17
to Belfast Linux User Group

Matt

unread,
Apr 15, 2017, 6:28:14 PM4/15/17
to belfas...@googlegroups.com
Hi,

You may also want to check out PhantomJS or headless Chrome / Chromium;
they are all browsers that run without a GUI, so you can visit the page
and then "click" on save after the page finishes loading. It uses far
more resources than wget but other hand you do get standard browser
behaviour (Javascript, DOM etc).

Matt

Jonny McCullagh

unread,
Apr 16, 2017, 2:28:47 PM4/16/17
to Belfast Linux User Group
Slightly off-topic (since mirroring is the requirement) but I once also created a screenshot service using a headless konqueror and ktml2png.
That allowed me to add screenshots of web pages for a URL shortening service in QUB. That was 6 years ago - as Matt suggests PhantomJS would probably be better now!
jonny

Matt

unread,
Apr 16, 2017, 4:50:17 PM4/16/17
to belfas...@googlegroups.com
Except the maintainer of PhantomJS resigned his maintainership just a
couple of hours after headless Chrome was confirmed for Chrome 59 4 days
ago.

Matt

Matt

unread,
Apr 16, 2017, 5:06:15 PM4/16/17
to belfas...@googlegroups.com

I should have posted the link too, sorry.

https://www.chromestatus.com/features/5678767817097216

Matt

Desmond Devlin

unread,
Apr 16, 2017, 5:13:16 PM4/16/17
to belfas...@googlegroups.com
Matt

Thanks. Perhaps I would need a wee visit to Farset Labs to get shown the ropes of Headless Browsers. I looked up a news article and found out that it is good for testing purposes. Curious, already.

I still browse with Firefox, though. :)

 
--
 
Desmond Devlin


--
You received this message because you are subscribed to the Google Groups "Belfast Linux User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to belfastlinux...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt

unread,
Apr 16, 2017, 5:34:04 PM4/16/17
to belfas...@googlegroups.com
On 16/04/17 22:13, Desmond Devlin wrote:
Matt

Thanks. Perhaps I would need a wee visit to Farset Labs to get shown the ropes of Headless Browsers. I looked up a news article and found out that it is good for testing purposes. Curious, already.

I still browse with Firefox, though. :)

 
Hi Desmond,

I run Firefox too! :)

Current nightly builds of Firefox support headless mode (which means headless will be available in stable in about 6 months),

$ MOZ_HEADLESS=1 firefox

In the meantime if you want to run a stable version you can always use the Xvb display server (virtual buffer), and use either xdotool or Selenium.

Matt

Martin Naughton

unread,
Apr 16, 2017, 6:01:48 PM4/16/17
to belfas...@googlegroups.com

For headless browsing you can also setup a virtual terminal for xorg.

The program to install is xvfb

Works with any broswer.  No need to put chrome in headless mode.

I use it for my automated testing. I tried out phatomjs but it wasnt able to click buttons as part of my testing.

To unsubscribe from this group and stop receiving emails from it, send an email to belfastlinux+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Belfast Linux User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to belfastlinux+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages