Capturing Request & Response

44 views
Skip to first unread message

PRACHI VAKHARIA

unread,
Apr 24, 2014, 3:37:24 AM4/24/14
to web...@googlegroups.com

 
Hello Everyone,

I am using the "from gluon.tools import fetch" to fetch(URL) from external website. 
The external site has links on the page which get reloaded as "127.0.0.1:8000/LINK" on the displayed page.
 
Questions:
— When such a link clicked, how to extract that link from the request?
— How can those links be used to fetch the actual pages from the external site?

 
Some sites load with absolute URLs in the page, as "en.wikipedia.org/wiki/web2py".
 
Questions:
— When such a link clicked, how to extract that link from the request?
— How can those links be extracted to not directly fetch the actual pages from the external site?


Thank you!

PRACHI



Massimo Di Pierro

unread,
Apr 24, 2014, 9:02:19 AM4/24/14
to web...@googlegroups.com
I am not completely sure I understand when you are asking. The only reason to use fetch is for portability because urllib.urlopen does not run on the Google App Engine. If you are not running and not planning to run on the google app engine, I would use urllib.urlopen instead.

If I understand your problem is getting the page content and the links in there. You need an external library for it. For example beautifulsoup.


Notice that having a web2py action call an external URL can be slow. You should probably queue your requests and have a background process make your external requests. Details depend on the purpose of your code.

Massimo
Reply all
Reply to author
Forward
0 new messages