I've been scratching my head for days, now, trying to figure out what
I need to do to retrieve the form on this page:
https://ramps.uspto.gov/eram/getMaintFeesInfo.do
The obvious business of capturing the hidden field contents and
loading a fields hash appropriately will only return a page with an
error they probably submit whenever they guess that you're a program,
rather than a Web browser; but if you try to retrieve the form in
Opera or Firefox all will go as expected.
Since the transaction is over HTTPS, I can't just sniff at it and
write a simulation of every bite the server and client are throwing at
one another.
This should be an hour's coding, not days and days of fruitless head scratching.
Does anybody know what the trouble is? Is there a special step I have
to do to make the Mechanize object impersonate a browser or hurl back
whatever idiotic cookies and markers and other such tin-cans the
server is tossing at visitors?
Code that doesn't do what we want follows:
==========================
#!/usr/bin/perl -w
use WWW::Mechanize;
$patno or $patno = '7107206';
$serno or $serno = '10130357';
my $url = 'https://ramps.uspto.gov/eram/patentMaintFees.do';
my $m = WWW::Mechanize->new();
$m->get($url);
my( $parameter ) = $m->content =~ m{getMaintFeesInfo\.do;([^'"]*)};
my( $signature ) = $m->content =~ m/name=\"signature\"\s+value=\"([^"]+)\"/;
my( $loadtime ) = $m->content =~ m/name=\"loadTime\"\s+value=\"([^"]+)\">/ ;
my( $sessionId ) = $m->content =~ m/\"sessionId\" value=\"(.+)\">/ ;
print "$parameter\n$signature\n$sessionId\n$loadtime\n";
my $fields = {
'patentNum' => $patno,
'applicationNum' => $serno,
'signature' => $signature,
'loadTime' => $loadtime,
'sessionId' => $sessionId,
'maintFeeAction' => 'Retrieve Fees to Pay',
'maintFeeYear' => '04',
};
my $r = $m->submit_form( form_number => 1, fields => $fields );
print $r->content;
it cannot answer because I'm a machine, rather
than a human using a Web client program on a GUI interface.
...
>
> I believe I tried various aliases already, and got the same results.
> Does the Mechanize agent hurl back cookies set by the server?
>
> --
Yes, but only if you «use HTTP::Cookies;» see http://search.cpan.org/~gaas/HTTP-Cookies-6.01/lib/HTTP/Cookies.pm
In many sites, the cookies are the only link between successive queries.
An indispensible tool to monitor the two directions traffic is the "WireShark" application (http://www.wireshark.org/) which is very easy to work with.
Meir
Warren
> --
> You received this message because you are subscribed to the Google Groups "WWW::Mechanize users" group.
> To post to this group, send email to www-mecha...@googlegroups.com.
> To unsubscribe from this group, send email to www-mechanize-u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/www-mechanize-users?hl=en.
>
i do a lot of screen/app scraping. don't use perl for it though.
here's a high level of what i've found to work.
use one of the plugins for firefox, to examine the net traffic between
browser/target site. this gives you a good idea of what the expected
to/from process should be. it also lets you know if you can do a
straight curl kind of process, or if you need to get more complex.
simple cookie/login/form kinds of sites can be handled using
python/libxml2dom functions, with xpath to extract the required data
from the dom.
for complex sites that implement javascript/dynamic content
generation, you aren't going to easily get the content unless you
replicate the browser session. this can be accomplished, using one of
the headless java apps like htmlunit, or selenium.
bottom line, takes some effort, but most results are doable.
peace
> --
> You received this message because you are subscribed to the Google
> Groups "WWW::Mechanize users" group.
> To post to this group, send email to www-mechanize-
> us...@googlegroups.com.
> To unsubscribe from this group, send email to www-mechanize-
> users+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/www-mechanize-users?hl=en.
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.927 / Virus Database: 271.1.1/4283 - Release Date:
> 03/21/12 09:34:00
It made my life a *lot* easier when I was writing some scripts to automatically grab my bank statement. I can't remember the exact details now, but I was having similar problems with values in fields which Charles helped me spot very quickly!
Warren
What I may need to do is make up a POST request instead of using the
"submit_form" method, unless one of you knows a way to force the
Mechanize agent to include the name and value attributes of an input
of type SUBMIT when it sends a form.