Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to interact with a PHP driven server?

15 views
Skip to first unread message

Helmut Giese

unread,
Feb 3, 2003, 8:52:55 AM2/3/03
to
Hello out there,
I need to retrieve data from an archive which is accessible via
internet. Doing it interactively I select a year from a listbox and -
whoosh - a table on the page gets updated.

Looking at the HTML code I notice the following
<option value="quotes.php?ew_id=3563">2001</option>
<option value="quotes.php?ew_id=3447">2000</option>
<option value="quotes.php?ew_id=3348">1999</option>
... and so on ...

I am positive, that the browser sends out something when I - manually
- select another year: On a slow connection I get the 'hour glass
cursor' for a while before the page is updated.

There is no POST or GET action, like there would be (IIRC) if a cgi
script were activated.

Now, is there any way to find out what the browser sends out
when I select a year? Or does anybody have any clues from this -
admittedly short - description, to tell me how to start?

BTW, I'm on Windows (98).
Any hints or ideas will be greatly appreciated.
Helmut Giese

Rohan Pall

unread,
Feb 3, 2003, 9:22:07 AM2/3/03
to
hgi...@ratiosoft.com (Helmut Giese) wrote in news:3e3e71d5...@News.CIS.DFN.DE:

> Now, is there any way to find out what the browser sends out
> when I select a year? Or does anybody have any clues from this -
> admittedly short - description, to tell me how to start?

You need a proxy server that has full logging ability. Something to stand between yourself and the
webserver. I use Proxomitron, a simple yet powerful tool on windows.
http://www.proxomitron.org
You can open a log window, and see exactly what is being sent and received. Make sure that you turn
off the active filters so requests do not get changed at all. Make sure that the settings of your
browser are set to use port 8080 on localhost once the program is installed.
You have to open the logging screen manually.

Helmut Giese

unread,
Feb 3, 2003, 10:52:21 AM2/3/03
to
On Mon, 03 Feb 2003 14:22:07 GMT, Rohan Pall <ro...@rohanpall.com>
wrote:

Thanks for the link. I'll give it a try.
Best regards
Helmut Giese

Cameron Laird

unread,
Feb 3, 2003, 11:48:31 AM2/3/03
to
In article <3e3e9003...@News.CIS.DFN.DE>,
.
.
.
I'm going to argue again that we're making this too complicated.

Here are the first things you should do: I take it that you have
a page of information--let's call it P--that is valuable to you.
You are able to reach P with conventional browser navigation.
1. Can you reach P through the use of lynx or a
comparable character-mode browser?
2. Can you reach P with its URL alone? Do this:
go to P. Note its apparent URL. Use that
URL in an independent browser--*not* a differ-
ent window of the same browser. Do you reach
P that way, or a diagnostic screen?
3. If its URL alone doesn't give you P, what is
the minimum sequence you know to reach P? Is
it to "log in", then select a link that takes
you directly to where you want to go?

I predict we solve this without resort to proxying "sniffers".

An elaboration and a correction: I'm all in favor of Proxomitron
use, when necessary. Also, Helmut, at one point in this thread
you seemed to believe that CGI and GET or POST have some logical
connection, that is, that one requires the other. It's not so.
CGI can occur with or without GET, or POST, or vice-versa.
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://phaseit.net/claird/home.html

Helmut Giese

unread,
Feb 3, 2003, 2:06:37 PM2/3/03
to
On Mon, 03 Feb 2003 16:48:31 -0000, cla...@lairds.com (Cameron Laird)
wrote:
[snip useful instruction on using Proxomitron]

Hi Cameron,
I was off for a few hours and thought I'd continue tomorrow. Instead
it seems like I'll have to continue today ... :)


.
>I'm going to argue again that we're making this too complicated.

I tend to agree.

>Here are the first things you should do: I take it that you have

This is what I have got so far:
- With a one-liner using the http package I am able to get the page
(rather the frame) I'm interested in.
- This page contains the information I want *plus* the afore mentioned
listbox containing years to select from. The data shown on the page is
from the current (default) selection.

Then comes some HTML which I don't understand. It looks like this:

<form>
<select name="jahr" onChange="window.location.href =
this.options[this.selectedIndex].value" size="1" class="dropdownJahr">
<option value="quotes.php?ew_id=3784" selected>2003</option>
<option value="quotes.php?ew_id=3671">2002</option>


<option value="quotes.php?ew_id=3563">2001</option>

... and so on ...

</select>
</form>



>I predict we solve this without resort to proxying "sniffers".

Ok, let's go :)
Hm, from staring closely at the above one might deduce
- that there is a variable of name 'jahr' (which sounds plausible) and
- that there are lots of possible values like "quotes.php?ew_id=3784"
for 2003, "quotes.php?ew_id=3671" for 2002, etc.
And now, ladies and gentlemen, the magic syntax to resolve the mystery
................

>Also, Helmut, at one point in this thread
>you seemed to believe that CGI and GET or POST have some logical
>connection, that is, that one requires the other. It's not so.
>CGI can occur with or without GET, or POST, or vice-versa.

Thanks for the correction - all the better. The only example I ever
saw where GET and/or POST was used was interacting with a cgi script -
*plus* in the <form> (??) tag it said
method="POST" (IIRC)
Not seeing anything familiar here I assumed that a totally different
route needed to be taken.

I hope the above makes any sense to you.
Best regards
Helmut Giese

David Bigelow

unread,
Feb 3, 2003, 2:36:01 PM2/3/03
to
hgi...@ratiosoft.com (Helmut Giese) wrote in message news:<3e3e71d5...@News.CIS.DFN.DE>...

> Looking at the HTML code I notice the following
> <option value="quotes.php?ew_id=3563">2001</option>
> <option value="quotes.php?ew_id=3447">2000</option>
> <option value="quotes.php?ew_id=3348">1999</option>
> ... and so on ...

These ew_id's look funny -- they do not correlate to the value very
well - thay may be a longer term problem.


> I am positive, that the browser sends out something when I - manually
> - select another year: On a slow connection I get the 'hour glass
> cursor' for a while before the page is updated.

> There is no POST or GET action, like there would be (IIRC) if a cgi
> script were activated.

Assuming this is not a Java Applet, it sounds like you are on MS-IE
and that the page is usign an ADO Connection to the server (one of
MS's refreshless page technologies). Not sure how exactly they do
that, but I will bet there is something along the lines of a socket
built into the browser you have -- there should be an alternative
method to access this info on their server if this is the case
(becasue it would not make sense to have just MS-IE browsers visit the
site). Try the site wiht Mozilla - see what it does differently (I
will bet there is a refresh)...

After the Mozilla test, you should try to use the "http" pacakge to
look at the server interactions and responses more programatically --
I have often found that to be revealing for the details?

Dave

Rohan Pall

unread,
Feb 3, 2003, 2:42:18 PM2/3/03
to
cla...@lairds.com (Cameron Laird) wrote in
news:v3t7av8...@corp.supernews.com:

> I predict we solve this without resort to proxying
> "sniffers".
>
> An elaboration and a correction: I'm all in favor of
> Proxomitron use, when necessary. Also, Helmut, at one point

Learning how to use a good http proxy will help you with any but
the most basic kind of web scraping.

Used in conjunction with an alert mind that is focused on the
html source of the page, you're on the road to success.

Rohan Pall

unread,
Feb 3, 2003, 2:47:13 PM2/3/03
to
hgi...@ratiosoft.com (Helmut Giese) wrote in
news:3e3eb787...@News.CIS.DFN.DE:

> <form>
> <select name="jahr" onChange="window.location.href =
> this.options[this.selectedIndex].value" size="1"
> class="dropdownJahr"> <option value="quotes.php?ew_id=3784"
> selected>2003</option> <option
> value="quotes.php?ew_id=3671">2002</option> <option
> value="quotes.php?ew_id=3563">2001</option>
> ... and so on ...
> </select>
> </form>

Look at the current page location.
Do a relative "AND" with quotes.php?ew_id=3671 for 2002.
Scrape that page.

For example, if you are on the page:
http://stocks.com/content/hello.html
Then your url for 2002 will become
http://stocks.com/content/quotes.php?ew_id=3671


Rohan Pall

unread,
Feb 3, 2003, 2:51:42 PM2/3/03
to
davidh...@compuserve.com (David Bigelow) wrote in
news:712f19a1.03020...@posting.google.com:

> Assuming this is not a Java Applet, it sounds like you are
> on MS-IE and that the page is usign an ADO Connection to the
> server (one of MS's refreshless page technologies). Not

No, it uses the javascript onChange handler. It then goes to a
new page, by setting the document location.

For more info:
http://hotwired.lycos.com/webmonkey/98/04/index3a_page10.html?
tw=programming

Rohan Pall

unread,
Feb 3, 2003, 2:53:51 PM2/3/03
to
hgi...@ratiosoft.com (Helmut Giese) wrote in
news:3e3eb787...@News.CIS.DFN.DE:

> <form>
> <select name="jahr" onChange="window.location.href =
> this.options[this.selectedIndex].value" size="1"
> class="dropdownJahr"> <option value="quotes.php?ew_id=3784"
> selected>2003</option> <option
> value="quotes.php?ew_id=3671">2002</option> <option
> value="quotes.php?ew_id=3563">2001</option>
> ... and so on ...
> </select>
> </form>

If you want to know how this works, read this:

Theory and Drop Down List URL Jump Box
Basic JavaScript Theory

http://www.davesite.com/webstation/js/theory1jump.shtml

Helmut Giese

unread,
Feb 3, 2003, 3:46:26 PM2/3/03
to
On Mon, 03 Feb 2003 19:47:13 GMT, Rohan Pall <ro...@rohanpall.com>
wrote:

>hgi...@ratiosoft.com (Helmut Giese) wrote in

Hey, you're a genius, it works.
I did have the feeling that installing something like Proxomitron was
a bit of overkill for this situation. None the less, I will go and get
it - ain't no such thing as too many tools in one's toolbox.

Thanks again
Helmut (off for web scraping) Giese

Helmut Giese

unread,
Feb 3, 2003, 3:58:55 PM2/3/03
to
On 3 Feb 2003 11:36:01 -0800, davidh...@compuserve.com (David
Bigelow) wrote:

>hgi...@ratiosoft.com (Helmut Giese) wrote in message news:<3e3e71d5...@News.CIS.DFN.DE>...
>
>> Looking at the HTML code I notice the following
>> <option value="quotes.php?ew_id=3563">2001</option>
>> <option value="quotes.php?ew_id=3447">2000</option>
>> <option value="quotes.php?ew_id=3348">1999</option>
>> ... and so on ...
>
>These ew_id's look funny -- they do not correlate to the value very
>well - thay may be a longer term problem.

But not mine :) - I'll just get the content once and that's all.


>
>> I am positive, that the browser sends out something when I - manually
>> - select another year: On a slow connection I get the 'hour glass
>> cursor' for a while before the page is updated.
>
>> There is no POST or GET action, like there would be (IIRC) if a cgi
>> script were activated.
>
>Assuming this is not a Java Applet, it sounds like you are on MS-IE
>and that the page is usign an ADO Connection to the server (one of
>MS's refreshless page technologies). Not sure how exactly they do
>that, but I will bet there is something along the lines of a socket
>built into the browser you have -- there should be an alternative
>method to access this info on their server if this is the case
>(becasue it would not make sense to have just MS-IE browsers visit the
>site). Try the site wiht Mozilla - see what it does differently (I
>will bet there is a refresh)...

Gee, can you think of complicated things :). But - luckily - these
don't come into play here. I am using Netscape all the time - and as
for understanding the rest: let's leave this for another day, ok?

>After the Mozilla test, you should try to use the "http" pacakge to
>look at the server interactions and responses more programatically --
>I have often found that to be revealing for the details?

Yes, I did this already but got stuck on selecting different years.

But then some nice guy (or gal) comes along and helps out. clt is just
marvellous - if it didn't exist already, it needed to be invented.
Best regards
Helmut Giese

Cameron Laird

unread,
Feb 3, 2003, 4:07:52 PM2/3/03
to
In article <3e3ed010...@News.CIS.DFN.DE>,
Helmut Giese <hgi...@ratiosoft.com> wrote:
.
.
.

>>Then your url for 2002 will become
>>http://stocks.com/content/quotes.php?ew_id=3671
>Hey, you're a genius, it works.
>I did have the feeling that installing something like Proxomitron was
>a bit of overkill for this situation. None the less, I will go and get
>it - ain't no such thing as too many tools in one's toolbox.
>
>Thanks again
>Helmut (off for web scraping) Giese

And note how this worked: while you were right to
recognize the involvement of PHP and JavaScript,
they're essentially transparent. You needed zero
knowledge of PHP, and I can make a case that even
the minimal JavaScript decoding involved was super-
fluous.

The Web's a fun place to play. It take surprisingly
little language-specific knowledge to do so.

Rohan Pall

unread,
Feb 3, 2003, 4:32:15 PM2/3/03
to
cla...@lairds.com (Cameron Laird) wrote in
news:v3tmh8d...@corp.supernews.com:

> they're essentially transparent. You needed zero
> knowledge of PHP, and I can make a case that even
> the minimal JavaScript decoding involved was super-
> fluous.

I agree. With a basic understanding of most web technologies and
some handy reference urls, you can go a long way.

Rohan Pall

unread,
Feb 3, 2003, 4:35:10 PM2/3/03
to
hgi...@ratiosoft.com (Helmut Giese) wrote in
news:3e3ed5fb...@News.CIS.DFN.DE:

> clt is just marvellous - if it didn't exist already, it
> needed to be invented.

QOTW. groups.google.com is pretty handy too for searching
through the clt archive.

Helmut Giese

unread,
Feb 3, 2003, 4:47:02 PM2/3/03
to
On Mon, 03 Feb 2003 21:32:15 GMT, Rohan Pall <ro...@rohanpall.com>
wrote:

>cla...@lairds.com (Cameron Laird) wrote in

Future historians will note:
... and this was the beginning of a great career in web scraping.
Helmut (just taking a short rest from scraping) Giese

Helmut Giese

unread,
Feb 4, 2003, 9:30:48 AM2/4/03
to
Thank you for this clear explanation.
Helmut Giese

David Bigelow

unread,
Feb 4, 2003, 7:08:19 PM2/4/03
to
Rohan,
> No, it uses the javascript onChange handler. It then goes to a
> new page, by setting the document location.

Yes, but "onChange" Requires a "Refresh/New-Page Generation" of the
page. -- Due to the POST/GET to the server.

ORIGINAL POST (Trimmed):


>> I need to retrieve data from an archive which is accessible via
>> internet. Doing it interactively I select a year from a listbox and
-
>> whoosh - a table on the page gets updated.

The ".... whoosh - a table on the page gets updated." without mention
of a "refresh" made me think that this was a "refreshless" page on
MS-IE. Past experiences have pointed to ADO - which can do this
without refreshing the whole web page.

Sorry for the confusion, but the question seemed to be driven around a
non-refreshing page.

Dave

Rohan Pall

unread,
Feb 4, 2003, 10:16:51 PM2/4/03
to

> Sorry for the confusion, but the question seemed to be


> driven around a non-refreshing page.

No problem, you taught me something new - the ADO refresh.


Helmut Giese

unread,
Feb 5, 2003, 8:55:29 AM2/5/03
to
On 4 Feb 2003 16:08:19 -0800, davidh...@compuserve.com (David
Bigelow) wrote:

>Rohan,
>> No, it uses the javascript onChange handler. It then goes to a
>> new page, by setting the document location.
>
>Yes, but "onChange" Requires a "Refresh/New-Page Generation" of the
>page. -- Due to the POST/GET to the server.
>
>ORIGINAL POST (Trimmed):
>>> I need to retrieve data from an archive which is accessible via
>>> internet. Doing it interactively I select a year from a listbox and
>-
>>> whoosh - a table on the page gets updated.

Funny, how little things sometime lead to quite different conclusions.

Trying to go back in time ....
Ah yes, the 'whoosh' at this moment meant to me, that no further
action was necessary (like pushing a 'submit' button or such).

Best regards
Helmut Giese

Patrick Spence

unread,
Feb 7, 2003, 5:02:21 PM2/7/03
to

"Helmut Giese" <hgi...@ratiosoft.com> wrote in message
news:3e3e71d5...@News.CIS.DFN.DE...


It looks like it might be a javascript "on change" command set to that
select value. I do the same thing on a couple sites.. you put an event set
to the onchange value of the select you want, then in the event procedure
you go to the url specified in the option value..


--
--
Patrick Spence, MIS
Mayor Pharmaceutical Labs/Regency Medical Research, Ltd.
2401 South 24th Street, Phoenix, AZ 85034
patr...@DELTOMAILmayorlabs.com - http://www.vitamist.com


0 new messages