The web-page I'm automating with mech is not returning the same data
that it returns when accessed with a browser. The java-script on the
form page, as far as I can see, does no more than sanitize what the user
enters, I can't see how it could possibly be telling the server that the
request is not a real person. I've got $mech->agent_alias('Mac Safari')
so it should think the request is from a real browser. I'm returning
input for all form fields (including the default values for the hidden
fields). I don't know where to go from here.
Any suggestions will be gratefully received. What follows is only for
anyone who would like to know, in detail, what's going on. My code is at
the bottom of this message.
Here's an overview of why I'm doing what I am: The site I'm trying to
automate is an EU governmental one. My employer has recently been hit
with a VAT bill because one of our customers de-registered for VAT early
last year and therefore we shouldn't be sending goods VAT-free. The
customer didn't tell us, and we're liable for unpaid VAT. Our government
site suggest regular checking of customer VAT numbers against the EU
database to avoid this. There is a module on CPAN that can give a
valid/invalid result for VAT details, but I want to capture the
certificate the site issues because it has a consultation number as
proof of checking the VAT status, without which the certificate is
useless, it could easily have been fraudulently made. The only part of
the data that's missing when I use mech is the consultation number, the
vital part that proves you did carry out the check.
The CPAN module that gets halfway there is
Business::Tax::VAT::Validation. I didn't discover this until I'd almost
completed my program, but it doesn't provide the consultation number
which can be used as a defence if challenged by the VAT authorities.
Here's the code I'm using, VAT numbers omitted for privacy, if anyone
wants to run this, and can't find any valid data to use, please let me
know by email and I'll send some data that works, I just don't want to
put other people's data 'out there':
my $cust = shift;
# an array ref. $_->[0] = our customer identifier - not submitted, just
# happens to be in the array.
# $_->[1] = 2 letter country EU code
# $_->[2] = customer VAT number
my %requester = (
state => 'GB', # 2 letter country EU code
vat_no => '', # a valid VAT number for the country # - removed for privacy reasons.
);
my $mech = WWW::Mechanize->new();
# Set a sensible UA - don't want to be thought of as a bot
$mech->agent_alias('Mac Safari');
# load the page containing the form
$mech->get($site_root);
# fill-in and submit the form
$mech->submit_form(
form_name => 'frmVat',
fields => {
ms => $cust->[1],
iso => $cust->[1],
vat => $cust->[2],
reqeusterMs => $requester{state},
requesterIso => $requester{state},
requesterVat => $requester{vat_no},
BtnSubmitVat => 'Verify',
name => '',
companyType => '',
street1 => '',
postcode => '',
city => '',
}
);
> The web-page I'm automating with mech is not returning the same data
> that it returns when accessed with a browser. The java-script on the
> form page, as far as I can see, does no more than sanitize what the user
> enters, I can't see how it could possibly be telling the server that the
> request is not a real person. I've got $mech->agent_alias('Mac Safari')
> so it should think the request is from a real browser. I'm returning
> input for all form fields (including the default values for the hidden
> fields). I don't know where to go from here.
[...]
Firefox with the Firebug Plug-in might help you find if you're
sending values differently. You easily can see what's sent and the response in the Console tab.
If that matches what you're sending, then possibly what's returned
is processed by Javascript before being displayed to the browser, and
that would be the next thing to examine.
J. Gleixner <glex_no-s...@qwest-spam-no.invalid> wrote:
> On 02/10/12 04:12, Justin C wrote:
>> The web-page I'm automating with mech is not returning the same data
>> that it returns when accessed with a browser. The java-script on the
>> form page, as far as I can see, does no more than sanitize what the user
>> enters, I can't see how it could possibly be telling the server that the
>> request is not a real person. I've got $mech->agent_alias('Mac Safari')
>> so it should think the request is from a real browser. I'm returning
>> input for all form fields (including the default values for the hidden
>> fields). I don't know where to go from here.
> [...]
> Firefox with the Firebug Plug-in might help you find if you're
> sending values differently. You easily can see what's sent and the > response in the Console tab.
> If that matches what you're sending, then possibly what's returned
> is processed by Javascript before being displayed to the browser, and
> that would be the next thing to examine.
It logs HTTP requests/responses in the form of Perl code (UserAgent).
Then we don't need to know what all the JS does, we just need to know
how to construct the request we want...
-- Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
> On 02/10/12 04:12, Justin C wrote:
>> The web-page I'm automating with mech is not returning the same data
>> that it returns when accessed with a browser. The java-script on the
>> form page, as far as I can see, does no more than sanitize what the user
>> enters, I can't see how it could possibly be telling the server that the
>> request is not a real person. I've got $mech->agent_alias('Mac Safari')
>> so it should think the request is from a real browser. I'm returning
>> input for all form fields (including the default values for the hidden
>> fields). I don't know where to go from here.
> [...]
> Firefox with the Firebug Plug-in might help you find if you're
> sending values differently. You easily can see what's sent and the > response in the Console tab.
> If that matches what you're sending, then possibly what's returned
> is processed by Javascript before being displayed to the browser, and
> that would be the next thing to examine.
Thank you for the suggestion J, but I've been bitten by FireFox plugins
before and avoid them now. The WSP suggestion is working for me.
>> It logs HTTP requests/responses in the form of Perl code (UserAgent).
>> Then we don't need to know what all the JS does, we just need to know
>> how to construct the request we want...
> Thank you, Tad, that's very useful. I think I've found a cookie. I'm
> testing new code now.
Update: After much *much* time, too much time, trying to debug this I
finally found the problem. I had a typo in the name of a field in the
$mech->submit_form. A small transpose of two letters. The site still
functioned as I expected, but my error caused the site not to give me a
confirmation number.
I hate how debugging takes 3 times (or more) the time it takes to code!
Anyway, thanks to Tad and J for their suggestions.