raise ValueError(No <form> element found in %s % response) when trying to login with scrapy

1,331 views
Skip to first unread message

Ana Carolina Assis Jesus

unread,
Sep 11, 2013, 3:03:56 AM9/11/13
to scrapy...@googlegroups.com
Hello!

So I guess I am doing some progress if I get some new warning/error messages. (even if is not to the direction I was hoping for)

Anyway, I now get this error message AFTER it looks like I had logged in. 
So I gather I DIDN'T logged in.

I went then to check what happens:
ERROR:
File "/Users/ajesus/Documents/Scraper/ogoneProject/ogoneProject/spiders/ogone2Test.py", line 36, in parse
   return [FormRequest.from_response(response,formdata={'userid': 'BLABLBALBLA', 'PSWD': 'tatatata'},callback=self.after_login)]
 File "/Library/Python/2.7/site-packages/Scrapy-0.18.2-py2.7.egg/scrapy/http/request/form.py", line 36, in from_response
   form = _get_form(response, formname, formnumber, formxpath)
 File "/Library/Python/2.7/site-packages/Scrapy-0.18.2-py2.7.egg/scrapy/http/request/form.py", line 55, in _get_form
   raise ValueError("No <form> element found in %s" % response)


But I see in the source page:

<form method="POST" action="login.asp?time=9%2F11%2F2013+8%3A55%3A31+AM" name="MainForm" id="MainForm" onsubmit="return my_submit();">
<input type="hidden" name="CSRFKEY" value="66F69A310E1E216A52CB89CA9BABEDBD1239FE62" />
<input type="hidden" name="CSRFTS" value="20130911085531" />
<input type="hidden" name="CSRFSP" value="/ncol/prod/backoffice/homeindex.asp" />
<input type="hidden" name="BRANDING" value="Ogone">
<input type="hidden" name="lang" value="1">
<!--Added to maintain integration of ASP and ASP.NET-->
<input type="hidden" name="MigrationMode" value="DOTNET">
<input type="hidden" name="NoTopBanner" value="">
<div align="center">
<table border="0" width="80%" cellpadding="10" bordercolor="red">
<tr>
<td align="left" valign="top" width="33%" id="TD_LG_exp"><p><strong>Please enter your PSPID and your password in order to administer your account.</strong></p>
<p>The PSPID is the Merchant ID chosen by the merchant administrator when opening the Ogoneaccount. If you have forgotten your PSPID, please retrieve it from the contract you received upon activation.</p>
</td>
<td align="center" valign="top" width="33%" class=cadre >
<table border="0" cellspacing="0" cellpadding="5" bordercolor="yellow">
<tr>
<td width="100%" align="center">
<h2 class=normal>USERID:<br>
<input type="text" name="userid" size="20" value="" MAXLENGTH="30" AUTOCOMPLETE="Off" tabindex="1">
</h2>
<h2 class=normal>PSPID:<small>&nbsp;Optional</small><br>
<input type="text" name="refid" tabindex="5" size="20" value="" MAXLENGTH="35" AUTOCOMPLETE="Off">
</h2>
</td></tr>
</table>
<br>
<table border="0" cellspacing="0" cellpadding="0" bordercolor="magenta">
<tr>
<td width="100%" align="center">
<h2 class=normal>Password:<br>
<input type="password" name="PSWD" size="20" MAXLENGTH="20" AUTOCOMPLETE="Off" tabindex="3"><br>
<a class=normal href="login.asp?lost=1&adv=1&l_refid=&CSRFSP=%2FNcol%2FProd%2Flogin%2Easp&CSRFKEY=ACB96AF7B1B4961325B8F76CA47DDD5F39C5A1AF&CSRFTS=20130911085531" id="LGL">Lost your password&#63;</a></h2>

So, the form doesn't really ask for password and username, even though I have to enter this, this is in the table... But the form ask exactely for the keys I need to grab...

Anyone seem this before?
Anyone knows how to get the CSRFSP and CSRFKEY from it and how to add the login?

THANKS A LOT!
Cheers!
Ana

Paul Tremberth

unread,
Sep 11, 2013, 3:57:02 AM9/11/13
to scrapy...@googlegroups.com
Weird, when I run scrapy shell with the URL you are using, I can get a FormRequest() from the response
(I'm not running the same Scrapy as you though)

2013-09-11 09:54:00+0200 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Optional features available: ssl, django, http11, boto, libxml2
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6025
2013-09-11 09:54:00+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6082
2013-09-11 09:54:00+0200 [default] INFO: Spider opened
[s] Available Scrapy objects:
[s]   hxs        <HtmlXPathSelector xpath=None data=u'<html xmlns:ns><head><meta http-equiv="e'>
[s]   item       {}
[s]   settings   <CrawlerSettings module=None>
[s]   spider     <BaseSpider 'default' at 0x35f5210>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser
Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from scrapy.http.request.form import FormRequest

In [2]: frq = FormRequest.from_response(response)

In [3]: frq.body
Out[3]: 'CSRFKEY=49A306B81356198B1DAFF952058D17AAF3477015&CSRFTS=20130911095403&CSRFSP=%2Fncol%2Fprod%2Fbackoffice%2Fhomeindex.asp&BRANDING=OGONE&lang=1&MigrationMode=DOTNET&NoTopBanner=&refid=&PSWD=&adv=0&relogin=2&parurl=%2Fncol%2Fprod%2Fbackoffice%2Fhome&insert=Submit&button1=Cancel&btnSubmit=Submit'

In [4]: frq

Ana Carolina Assis Jesus

unread,
Sep 11, 2013, 5:51:49 AM9/11/13
to scrapy...@googlegroups.com
Hi Paul.

I tried to run on the shell like you did, but then I get an error message and it kind of crashes.
The error message is right after the Web service debug message and reads:

Below you have the whole log up to the crash.

How come you can make it work and I can't? Do I miss something?
Thanks!
Ana

2013-09-11 11:47:49+0200 [scrapy] INFO: Scrapy 0.18.2 started (bot: ogoneProject)
2013-09-11 11:47:49+0200 [scrapy] DEBUG: Optional features available: ssl, http11
2013-09-11 11:47:49+0200 [scrapy] DEBUG: Overridden settings: {'NEWSPIDER_MODULE': 'ogoneProject.spiders', 'SPIDER_MODULES': ['ogoneProject.spiders'], 'LOGSTATS_INTERVAL': 0, 'BOT_NAME': 'ogoneProject'}
2013-09-11 11:47:49+0200 [scrapy] DEBUG: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-11 11:47:50+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-11 11:47:50+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-11 11:47:50+0200 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-11 11:47:50+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-09-11 11:47:50+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-09-11 11:47:50+0200 [default] INFO: Spider opened
[s] Available Scrapy objects:
[s]   hxs        <HtmlXPathSelector xpath=None data=u'<html xmlns:ns><head><meta http-equiv="e'>
[s]   item       {}
[s]   settings   <CrawlerSettings module=<module 'ogoneProject.settings' from '/Users/ajesus/Documents/Scraper/ogoneProject/ogoneProject/settings.pyc'>>
[s]   spider     <BaseSpider 'default' at 0x10200d250>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

>>> 

On Wednesday, September 11, 2013 9:03:56 AM UTC+2, Ana Carolina Assis Jesus wrote:

Ana Carolina Assis Jesus

unread,
Sep 11, 2013, 6:02:19 AM9/11/13
to scrapy...@googlegroups.com
Hi Paul.
Never mind my previous question... I mean, I got the why I got the error message, I have a few other codes now inside.
And the crash is in fact the python opened shell (I didn't get any message in here)

Oops. Sorry about that!
I am trying it again and I will get back at you!

Thanks a lot! 


On Wednesday, September 11, 2013 9:03:56 AM UTC+2, Ana Carolina Assis Jesus wrote:

Ana Carolina Assis Jesus

unread,
Sep 11, 2013, 9:56:51 AM9/11/13
to scrapy...@googlegroups.com, paul.tremberth
Hi Paul.

So, I am trying to write on a script what we tried earlier this morning but I keep getting the same error.

I am sending the code and the error. I am sure is something stupid I am missing.

Thanks a lot again with all the help!
Cheers!
Ana

from scrapy.http import FormRequest
from scrapy.item import Item, Field
from scrapy.selector import HtmlXPathSelector

from scrapy.spider import BaseSpider


class Og2TestSpider(BaseSpider):
    name="og2test"
    allowed_domains=["secure.ogone.com"]
    
    def parse(self, response):
        frq = FormRequest.from_response(response)
        print frq.body
        print frq

        fetch(FormRequest.from_response(response,formdata={'userid': 'blablablab', 'PSWD': 'tititititi'}))
        og_body = hxs.select('string(.)').extract()
        print ''
        print og_body
        print ''


Message!
scrapy crawl og2test
2013-09-11 15:46:38+0200 [scrapy] INFO: Scrapy 0.18.2 started (bot: ogoneProject)
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Optional features available: ssl, http11
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Overridden settings: {'NEWSPIDER_MODULE': 'ogoneProject.spiders', 'SPIDER_MODULES': ['ogoneProject.spiders'], 'BOT_NAME': 'ogoneProject'}
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-11 15:46:38+0200 [og2test] INFO: Spider opened
2013-09-11 15:46:38+0200 [og2test] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6024
2013-09-11 15:46:38+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6081
CSRFKEY=AF1DC530D39ED0AE89D7CE2B997AB3B9A532970A&CSRFTS=20130911154638&CSRFSP=%2Fncol%2Fprod%2Fbackoffice%2Fhomeindex.asp&BRANDING=OGONE&lang=1&MigrationMode=DOTNET&NoTopBanner=&refid=&PSWD=&adv=0&relogin=2&parurl=%2Fncol%2Fprod%2Fbackoffice%2Fhome&insert=Submit&button1=Cancel&btnSubmit=Submit
Traceback (most recent call last):
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 1178, in mainLoop
   self.runUntilCurrent()
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/base.py", line 800, in runUntilCurrent
   call.func(*call.args, **call.kw)
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 368, in callback
   self._startRunCallbacks(result)
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 464, in _startRunCallbacks
   self._runCallbacks()
--- <exception caught here> ---
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 551, in _runCallbacks
   current.result = callback(current.result, *args, **kw)
 File "/Users/ajesus/Documents/Scraper/ogoneProject/ogoneProject/spiders/ogone2Test.py", line 18, in parse
   fetch(FormRequest.from_response(response,formdata={'userid': 'FabriceAT2', 'PSWD': 'qjm7w4Vb0s'}))
exceptions.NameError: global name 'fetch' is not defined
2013-09-11 15:46:40+0200 [og2test] INFO: Closing spider (finished)
2013-09-11 15:46:40+0200 [og2test] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 997,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 9904,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2013, 9, 11, 13, 46, 40, 673624),
'log_count/DEBUG': 8,
'log_count/ERROR': 1,
'log_count/INFO': 3,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'spider_exceptions/NameError': 1,
'start_time': datetime.datetime(2013, 9, 11, 13, 46, 38, 284080)}
2013-09-11 15:46:40+0200 [og2test] INFO: Spider closed (finished)



--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/4fpZN_R32Uc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Tremberth

unread,
Sep 11, 2013, 10:07:01 AM9/11/13
to scrapy...@googlegroups.com, paul.tremberth
The lines you are using in your spider are lines that work in scrapy shell (that we used to test the fetching of pages and sending the FormRequest() to login)

In a "real" spider you can do something like:


from scrapy.http import FormRequest
from scrapy.item import Item, Field
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider

class Og2TestSpider(BaseSpider):
    name="og2test"
    allowed_domains=["secure.ogone.com"]
    
    def parse(self, response):
        return FormRequest.from_response(response,formdata={'userid': 'blablablab', 'PSWD': 'tititititi'}, callback=self.after_login)

    def after_login(self, response):
        # debug: what do we actually get as logged-in page?
        print response.body

        hxs = HtmlXPathSelector(response)
        # select elements you need
        # do something awesome

Ana Carolina Assis Jesus

unread,
Sep 12, 2013, 2:20:21 AM9/12/13
to scrapy...@googlegroups.com
Hi Paul!

Thanks a lot!
It definitely work! I got a bit confused with the shell and the "real" thing.
I can definitely login and see the whole body page printed and in html.

Thanks a lot! :-)

Now, let me bother you a bit more...
Yesterday I told you I didn't see the whole thing, and you told me it should be there. Well, I don't see it on the html view, but I definetly see it on be body print.
What is missing is exactely the link I want to go to, which is the top banner after you are logged in, but it is inside a javascript.

Do you know how can I HtmlXpath a script link?
And after that how can I change my url to be this new one instead of the started one which I logged in?

How it looks like:
<script type="text/javascript" language="javascript">
    function GetMonthlyActivations(data) {
        if ($(data).html() == null || $(data).html() == "") {
            $("#div-monthly-activations-container").html('No data found.');
        } else {
            $("#div-monthly-activations").html(data).hide();
        }
    }

    $(document).ready(function () {
        $(".toggleContent").each(function () { $(this).addClass('cursauto'); if ($(this).parent() != undefined && $(this).parent() != null) { $(this).parent().addClass('curs'); } });

Thanks again, Ana
Reply all
Reply to author
Forward
0 new messages