The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Newsgroups: comp.unix.shell
From: Kaz Kylheku <k...@kylheku.com>
Date: Mon, 17 Sep 2012 18:05:12 +0000 (UTC)
Local: Mon, Sep 17 2012 2:05 pm
Subject: Re: Getting the text from a webpage (not the source)
On 2012-09-17, Bill Marcum <b...@nowhere.invalid> wrote:
> On 09/17/2012 05:25 AM, Guillaume Dargaud wrote:
What if the document that is rendered on the screen has some contents which are
>> Hello all, >> I would like to script the equivalent of doing Ctrl-C on a webpage in a >> browser, and then Ctrl-V in a text editor. >> In other words I would like the text from a webpage, after all the html+css >> and possibly javascript rendering. The idea is to get the text like a person >> sees it, no "display:none" shenanigans. >> I don't think it's a job for wget which only gets the source.
> If you wget the source of a web page and then view that file in a computed by Javascript? Wget doesn't contain a Javascript interpreter.
Guillaume is right.
Though, not sure how you can solve this easily with Unix shell tools.
You need a web scraping engine that processes Javascript.
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||