facebook.com login

1,148 views
Skip to first unread message

vitsin

unread,
Jan 11, 2011, 9:49:40 AM1/11/11
to scrapy-users
hi Guys,
did anyone try to create spider for login into existing facebook.com
account?
I've tried example from:
http://doc.scrapy.org/topics/request-response.html#topics-request-response-ref-request-userlogin
with no luck.

2011-01-11 09:34:59-0500 [facebook_dob] INFO: Spider opened
2011-01-11 09:34:59-0500 [facebook_dob] DEBUG: Redirecting (meta
refresh) to <GET http://www.facebook.com/?_fb_noscript=1> from <GET
http://www.facebook.com/>
2011-01-11 09:34:59-0500 [facebook_dob] DEBUG: Crawled (200) <GET
http://www.facebook.com/?_fb_noscript=1> (referer: None)
2011-01-11 09:34:59-0500 [facebook_dob] DEBUG: Arrived Response from
URL: http://www.facebook.com/?_fb_noscript=1
2011-01-11 09:34:59-0500 [facebook_dob] ERROR: Spider error processing
<http://www.facebook.com/> (referer: <None>)
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/Twisted-10.1.0-py2.6-linux-
x86_64.egg/twisted/internet/base.py", line 1174, in mainLoop
self.runUntilCurrent()
File "/usr/lib64/python2.6/site-packages/Twisted-10.1.0-py2.6-linux-
x86_64.egg/twisted/internet/base.py", line 796, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib64/python2.6/site-packages/Twisted-10.1.0-py2.6-linux-
x86_64.egg/twisted/internet/defer.py", line 318, in callback
self._startRunCallbacks(result)
File "/usr/lib64/python2.6/site-packages/Twisted-10.1.0-py2.6-linux-
x86_64.egg/twisted/internet/defer.py", line 424, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib64/python2.6/site-packages/Twisted-10.1.0-py2.6-linux-
x86_64.egg/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/home/vitaly/Documents/SynapticVision/ISS/crawl/crawl/spiders/
facebook_dob.py", line 72, in parse
callback=self.after_login)]
File "/usr/lib/python2.6/site-packages/Scrapy-0.12.0.2524-py2.6.egg/
scrapy/http/request/form.py", line 46, in from_response
raise ValueError("No <form> element found in %s" % response)
exceptions.ValueError: No <form> element found in <200
http://www.facebook.com/?_fb_noscript=1>

2011-01-11 09:34:59-0500 [facebook_dob] INFO: Closing spider
(finished)


regards,
--vs

Pablo Hoffman

unread,
Jan 11, 2011, 1:22:50 PM1/11/11
to scrapy...@googlegroups.com
Hi vitsin,

You seem to be getting redirected to /?_fb_noscript=1 , which is probably not
what you want. Try with different user agents and download delays.

Pablo.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

vitsin

unread,
Jan 11, 2011, 1:52:36 PM1/11/11
to scrapy-users
thanks Pablo.

this is what worked for me:
start_urls = ['https://login.facebook.com/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formname='login_form',
formdata={'email': 'existing@email', 'pass':
'existing_password'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return

# continue scraping with authenticated session.

I was have to mention formname='login_form'.

--vs

On Jan 11, 1:22 pm, Pablo Hoffman <pablohoff...@gmail.com> wrote:
> Hi vitsin,
>
> You seem to be getting redirected to /?_fb_noscript=1 , which is probably not
> what you want. Try with different user agents and download delays.
>
> Pablo.
>
> On Tue, Jan 11, 2011 at 06:49:40AM -0800, vitsin wrote:
> > hi Guys,
> > did anyone try to create spider for login into existing facebook.com
> > account?
> > I've tried example from:
> >http://doc.scrapy.org/topics/request-response.html#topics-request-res...
> > with no luck.
>
> > 2011-01-11 09:34:59-0500 [facebook_dob] INFO: Spider opened
> > 2011-01-11 09:34:59-0500 [facebook_dob] DEBUG: Redirecting (meta
> > refresh) to <GEThttp://www.facebook.com/?_fb_noscript=1> from <GET
Reply all
Reply to author
Forward
0 new messages