Re: [OpenRefine] how to fetch all pages of urls such as "www.xxxxx.com/xxxxx.htm?pg=#" where # is a number

1,053 views
Skip to first unread message
Message has been deleted

Thad Guidry

unread,
Aug 12, 2014, 8:38:00 PM8/12/14
to openr...@googlegroups.com
You could create what you need by using Add a new column based on a GREL expression: 


and then fetch on that new column.  Variables in GREL are documented here: https://github.com/OpenRefine/OpenRefine/wiki/Variables

Myself, I do most of my webscraping using Outwit Hub : http://www.outwit.com and it is 2 mouse clicks to start a scrape that iterates over Next pages like what you are after.




On Tue, Aug 12, 2014 at 1:42 PM, wade <wa...@wandkcampbell.com> wrote:
I'd like to fetch the html for a URL such as "http://www.xxxx.com/xxxx.htm", but it has many pages after the initial one that have the form "http://www.xxxx.com/xxxx.htm?pg=#" where # is some number (in this case, from 2 up to 300+).  Is there a way to fetch the html code from each of these?

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
Message has been deleted
0 new messages