Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

web page importing

0 views
Skip to first unread message

Rod Speed

unread,
Aug 12, 2002, 10:44:37 PM8/12/02
to
Looking for a way to import web pages, into Access.

Been doing this for a long time now and that works
very well when an explicit url is available, like say
http://www.sofcom.com.au/cgi-bin/TV/byChannel?date=Tuesday_13_August&chan=2&state=NSWReg&fta=1&fox=1&opt=1

Trouble is that sometimes an explicit url isnt available, like with
http://www.foxtel.com.au/foxtelguide

How to I import that sort of page, say one for a specific channel on a specific date ?

I want it completely automated, no manual web browsing at all.

The problem is that those pages I cant currently import are basically java.
Is there some way of driving those using the web browser control etc ?


rf

unread,
Aug 12, 2002, 11:11:04 PM8/12/02
to

"Rod Speed" <rod_...@yahoo.com> wrote in message
news:aj9rpt$19v6lp$1...@ID-69072.news.dfncis.de...

> Looking for a way to import web pages, into Access.
>
> Been doing this for a long time now and that works
> very well when an explicit url is available, like say
>
http://www.sofcom.com.au/cgi-bin/TV/byChannel?date=Tuesday_13_August&chan=2&
state=NSWReg&fta=1&fox=1&opt=1
>
> Trouble is that sometimes an explicit url isnt available, like with
> http://www.foxtel.com.au/foxtelguide

> How to I import that sort of page, say one for a specific channel on a
specific date ?

You can't.

I suspect Foxtel do it that way (with method="post") specifically to stop
people programatically downloading their progamme guides. :-)

And no, you can't steal their form and use method="get". I tried this and
got their 404 page :-) The server side process obviously requires
method="post". You could construct your own HTTP client and reproduce what
the form does (using post) but that is more appropriate to a C++ or perhaps
C# group than an HTML one.

Why do you need to do this anyway? Foxtel politely snail mail you a
programme guide every month.

Cheers
Richad.


Rod Speed

unread,
Aug 12, 2002, 11:44:10 PM8/12/02
to

rf <making...@the.time> wrote in message
news:cN_59.13108$Sy4....@news-server.bigpond.net.au...
> Rod Speed <rod_...@yahoo.com> wrote

>> Looking for a way to import web pages, into Access.

>> Been doing this for a long time now and that works
>> very well when an explicit url is available, like say

> http://www.sofcom.com.au/cgi-bin/TV/byChannel?date=Tuesday_13_August&chan=2&
> state=NSWReg&fta=1&fox=1&opt=1

>> Trouble is that sometimes an explicit url isnt available, like with
>> http://www.foxtel.com.au/foxtelguide

>> How to I import that sort of page, say one
>> for a specific channel on a specific date ?

> You can't.

I find that hard to believe. If the java can do it, presumably at worst
I should be able to work out what the java is doing and do that myself.

> I suspect Foxtel do it that way (with method="post") specifically to
> stop people programatically downloading their progamme guides. :-)

Nar, there are other examples where that sort of form
is used, I just dont want to get the data from those sites.

In some cases like with Austar you can work out the
full url, even tho it isnt normally visible when browsing.

> And no, you can't steal their form and use method="get". I tried this and got
> their 404 page :-) The server side process obviously requires method="post".

Sure, but surely it must be feasible to do what the java does ?
How can their system even know that it aint their form at my end ?

> You could construct your own HTTP client and
> reproduce what the form does (using post)

Thats what I meant. I was hoping that it should be feasible to atleast
automate the manual browsing using the web browser control or something.

> but that is more appropriate to a C++ or
> perhaps C# group than an HTML one.

I was hoping there was some way of using the web browser control or something.

> Why do you need to do this anyway?

I do all that stuff from Access, keep track of what I have seen, browse the
program guides, tick what looks worth watching, have the system decide
which of the repeats to tape, since there is currently only one satellite
decoder, so blocks can be recorded from different channels etc.

Been doing that for years now. I currently use the softcom
program guide which can be directly imported, but its got some
real blemishes in the data, which the foxtel guide doesnt have.

There are other online program guides too, but most of those arent
really very well suited to importing, because they just list the program
title and the full description is in a separate popup for each item.
I do plan to automate the import of those sometime, but they
arent as good content wise as the foxtel guide itself.

Its surprisingly easy to import most online guides, ABC, softcom etc,
not a lot more than just using a TransferText in Access and a bit of
a tidy up where say the rating data is in the description field.

> Foxtel politely snail mail you a programme guide every month.

And have a quite decent web program guide which is a tad more hi tech.

rf

unread,
Aug 13, 2002, 12:59:43 AM8/13/02
to

"Rod Speed" <rod_...@yahoo.com> wrote in message
news:aj9v9h$15nhef$1...@ID-69072.news.dfncis.de...

>
> rf <making...@the.time> wrote in message
> news:cN_59.13108$Sy4....@news-server.bigpond.net.au...
> > Rod Speed <rod_...@yahoo.com> wrote
>
> >> Looking for a way to import web pages, into Access.
>
> >> Been doing this for a long time now and that works
> >> very well when an explicit url is available, like say
>
> >
http://www.sofcom.com.au/cgi-bin/TV/byChannel?date=Tuesday_13_August&chan=2&
> > state=NSWReg&fta=1&fox=1&opt=1
>
> >> Trouble is that sometimes an explicit url isnt available, like with
> >> http://www.foxtel.com.au/foxtelguide

OK. It is time to get some terminology nailed down.

The softcom URL above is the result of using a form with method="get". The
browser appends the fields of the form to the end of the URL specified in
the forms action and then sends it off to the server as a normal HTTP
request. The URL you see up there is what the browser sent in the request so
therefore that is what you see in the browsers address bar. You can also
manually type this URL in and retrieve the page.

The Foxtel site uses a form as well but uses method="post". This instructs
the browser to *not* append the fields to the URL in the forms action but to
embed the fields within the HTTP headers when it sends the request. Since
the fields are not sent as part of the URL you don't get to see them in the
address bar.

Some server side processes will accept either get or post. So, even if a
form is using post it is possible to decode what the browser is sending and
manually type them in the address bar as if it were a get request. Foxtel's
server side process does not accept this. It will only process the request
if the fields are sent in the headers. It ignores any fields sent as a get
request.

Try it and see. Steal their page (the one you see when you click on the link
above). Edit the form action to specify the full URL above (it currently
specifies a relative one, "/foxtelguide"). Browse the page in your local
file system. Fill in the form and submit it. It all works. The browser
builds the request and sends it off to http://www.foxtel.com.au/foxtelguide
and you get your info.

Now, change the form to method="get". It breaks. Even if you decode exactly
what the browser is sending in the post (or, better yet, use get and an
invalid URL so you can see the get parameters in the 404 error page) you
will not get the page you want back. Their server side process will not send
it.

> >> How to I import that sort of page, say one
> >> for a specific channel on a specific date ?
>
> > You can't.
>
> I find that hard to believe. If the java can do it, presumably at worst
> I should be able to work out what the java is doing and do that myself.

There is no java anywhere on that page. There is a bunch of JavaScript (a
totally different animal) but this has nothing to do with the submission of
the form, except for a bit of client side validation.

> > I suspect Foxtel do it that way (with method="post") specifically to
> > stop people programatically downloading their progamme guides. :-)
>
> Nar, there are other examples where that sort of form
> is used, I just dont want to get the data from those sites.

As I indicate above, those other examples must allow both post and get.

> In some cases like with Austar you can work out the
> full url, even tho it isnt normally visible when browsing.

See above. They allow method="get".

> > And no, you can't steal their form and use method="get". I tried this
and got
> > their 404 page :-) The server side process obviously requires
method="post".
>
> Sure, but surely it must be feasible to do what the java does ?
> How can their system even know that it aint their form at my end ?

There is no java. There is only a browser interpreting a form.

Foxtels server side insists on method="post". You can not figure out what
the browser does and simulate a method="get" request. The stuff after the ?
is ignored.

True, Foxtel cannot tell that it aint their form at your end. However they
most certainly can tell that you aren't simulating method="post" in your
HTTP request headers.

> > You could construct your own HTTP client and
> > reproduce what the form does (using post)
>
> Thats what I meant. I was hoping that it should be feasible to atleast
> automate the manual browsing using the web browser control or something.

Constructing an HTTP client is fairly trivial if you have C++ or C# or even
Visual Basic. What is *not* trivial is simulating what the browser does when
it processes a form with method="get". You have to go to a level far below
the standard "get me this page" processing. You have to fiddle with the HTTP
headers which is not very well documented, even in the MSDN. I know, it took
me several days to do it *and* I was using the inside bits of the web
browser control (those bits inside the VB control wrapper, the same bits
that are inside the wrapper that you percieve as "Internet Explorer"). The
difference between get and post is not even documented as well as what I
just said above. Go to http://microsoft.com and have a look.

> > but that is more appropriate to a C++ or
> > perhaps C# group than an HTML one.
>
> I was hoping there was some way of using the web browser control or
something.

Nope. Not unless you can convince the web browser control to include those
HTTP headers, or do it yourself. However, if you do find a way to convince
the web browser control to do it *please* report back with your findings.

> > Why do you need to do this anyway?
>

<snip valid reasons>

> > Foxtel politely snail mail you a programme guide every month.
>
> And have a quite decent web program guide which is a tad more hi tech.

Yes, but it is much harder to browse when one is in the bathroom having a...
er... bath :-)

Cheers
Richard.


Rod Speed

unread,
Aug 13, 2002, 2:31:01 AM8/13/02
to

rf <making...@the.time> wrote in message
news:3n069.13409$Sy4....@news-server.bigpond.net.au...
> Rod Speed <rod_...@yahoo.com> wrote
>> rf <making...@the.time> wrote
>>> Rod Speed <rod_...@yahoo.com> wrote

>>>> Looking for a way to import web pages, into Access.

>>>> Been doing this for a long time now and that works
>>>> very well when an explicit url is available, like say

> http://www.sofcom.com.au/cgi-bin/TV/byChannel?date=Tuesday_13_August&chan=2&
> > > state=NSWReg&fta=1&fox=1&opt=1

>>>> Trouble is that sometimes an explicit url isnt available, like with
>>>> http://www.foxtel.com.au/foxtelguide

> OK. It is time to get some terminology nailed down.

> The softcom URL above is the result of using a form with method="get".

Not when I import that data using Access, I just make
up the url on the fly, and get the web page using that url.

No browser is involved at all, I just use the InternetOpenUrl function.

> The browser appends the fields of the form to the end of the URL specified
> in the forms action and then sends it off to the server as a normal HTTP
> request. The URL you see up there is what the browser sent in the request
> so therefore that is what you see in the browsers address bar.

That one doesnt even come from a form, its just a link on
http://www.sofcom.com.au/TV/static/SydneyNight.html?

> You can also manually type this URL in and retrieve the page.

> The Foxtel site uses a form as well but uses method="post".
> This instructs the browser to *not* append the fields to the
> URL in the forms action but to embed the fields within the HTTP
> headers when it sends the request. Since the fields are not sent
> as part of the URL you don't get to see them in the address bar.

Sure, but its presumably possible to work out what happens by reading the java.

> Some server side processes will accept either get or post. So, even if a
> form is using post it is possible to decode what the browser is sending and
> manually type them in the address bar as if it were a get request. Foxtel's
> server side process does not accept this. It will only process the request if
> the fields are sent in the headers. It ignores any fields sent as a get request.

> Try it and see. Steal their page (the one you see when you
> click on the link above). Edit the form action to specify the
> full URL above (it currently specifies a relative one, "/foxtelguide").
> Browse the page in your local file system. Fill in the form and
> submit it. It all works. The browser builds the request and sends
> it off to http://www.foxtel.com.au/foxtelguide and you get your info.

> Now, change the form to method="get". It breaks.

> Even if you decode exactly what the browser is sending in the
> post (or, better yet, use get and an invalid URL so you can see
> the get parameters in the 404 error page) you will not get the
> page you want back. Their server side process will not send it.

>>>> How to I import that sort of page, say one
>>>> for a specific channel on a specific date ?

>>> You can't.

>> I find that hard to believe. If the java can do it, presumably at worst
>> I should be able to work out what the java is doing and do that myself.

> There is no java anywhere on that page. There is
> a bunch of JavaScript (a totally different animal)

Yeah, bit sloppy there.

> but this has nothing to do with the submission of
> the form, except for a bit of client side validation.

Sure, but it should be relevant to working out what data is sent
by my browser to get the page selected using the form fields.

>>> I suspect Foxtel do it that way (with method="post") specifically to
>>> stop people programatically downloading their progamme guides. :-)

>> Nar, there are other examples where that sort of form
>> is used, I just dont want to get the data from those sites.

> As I indicate above, those other examples must allow both post and get.

Sure, that was just a common on whether Foxtel is deliberately making
it hard to get the data. The other examples which also use 'get' have
no obvious reason for making it hard to get the data using Access etc.

>> In some cases like with Austar you can work out the
>> full url, even tho it isnt normally visible when browsing.

> See above. They allow method="get".

>>> And no, you can't steal their form and use method="get". I tried this and got
>>> their 404 page :-) The server side process obviously requires method="post".

>> Sure, but surely it must be feasible to do what the java does ?
>> How can their system even know that it aint their form at my end ?

> There is no java. There is only a browser interpreting a form.

See above.

> Foxtels server side insists on method="post". You can
> not figure out what the browser does and simulate a
> method="get" request. The stuff after the ? is ignored.

Sure, but why cant I have some code that uses the 'post' method ?
In fact I vaguely remember that at least some of the functions like
InternetOpenUrl allow a specification of whether to use 'get' or 'post'

> True, Foxtel cannot tell that it aint their form at your end.
> However they most certainly can tell that you aren't
> simulating method="post" in your HTTP request headers.

Sure, but I am essentially asking how I use 'post' instead,
and what data to use when requesting a specific page.

>>> You could construct your own HTTP client and
>>> reproduce what the form does (using post)

>> Thats what I meant. I was hoping that it should be feasible to atleast
>> automate the manual browsing using the web browser control or something.

> Constructing an HTTP client is fairly trivial if you have C++ or C# or even
> Visual Basic. What is *not* trivial is simulating what the browser does when
> it processes a form with method="get". You have to go to a level far below
> the standard "get me this page" processing. You have to fiddle with the HTTP
> headers which is not very well documented, even in the MSDN. I know, it took
> me several days to do it *and* I was using the inside bits of the web
> browser control (those bits inside the VB control wrapper, the same bits
> that are inside the wrapper that you percieve as "Internet Explorer"). The
> difference between get and post is not even documented as well as what I
> just said above. Go to http://microsoft.com and have a look.

Yeah, already had a look there when trying to work out how to do that.

What you are saying tho is that it obviously can be
done and I'm asking for the fine detail on how to do it.

>>> but that is more appropriate to a C++ or
>>> perhaps C# group than an HTML one.

>> I was hoping there was some way of
>> using the web browser control or something.

> Nope. Not unless you can convince the web browser control to include those
> HTTP headers, or do it yourself. However, if you do find a way to convince
> the web browser control to do it *please* report back with your findings.

I was hoping that there was some way of using some control
to get exactly the same effect as using the browser manually.
After all, all I need to do manually is select the appropriate Channel
entry in that field in the form and hit the appropriate search button.

You can certainly use the Back and Forward toolbar buttons
programatically when using the web browser control.

>>> Why do you need to do this anyway?

> <snip valid reasons>

>>> Foxtel politely snail mail you a programme guide every month.

>> And have a quite decent web program guide which is a tad more hi tech.

> Yes, but it is much harder to browse when
> one is in the bathroom having a...er... bath :-)

Not if you have a PC positioned appropriately. I already have one in the kitchen |-)


rf

unread,
Aug 13, 2002, 3:15:36 AM8/13/02
to

"Rod Speed" <rod_...@yahoo.com> wrote in message
news:aja92d$19kpo7$1...@ID-69072.news.dfncis.de...

>
> What you are saying tho is that it obviously can be
> done and I'm asking for the fine detail on how to do it.

Yes. It can be done. Exactly what are you doing at the moment. You mention
InternetOpenUrl but my documentation lists that as part of the Windows CE
API.

I assume you have some VB running inside an Access database.

The C++ code I have runs to several pages and requires Visual Studio.

Cheers
Richard.


Rod Speed

unread,
Aug 13, 2002, 3:55:52 AM8/13/02
to

rf <making...@the.time> wrote in message
news:sm269.13984$Sy4....@news-server.bigpond.net.au...
> Rod Speed <rod_...@yahoo.com> wrote

>> What you are saying tho is that it obviously can be
>> done and I'm asking for the fine detail on how to do it.

> Yes. It can be done. Exactly what are you doing at the moment. You mention
> InternetOpenUrl but my documentation lists that as part of the Windows CE API.

http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/wininet/reference/functions/internetopenurl.asp
says its a standard Win32 Internet function.

> I assume you have some VB running inside an Access database.

Yep.

> The C++ code I have runs to several pages and requires Visual Studio.

OK, thanks for that. I'll have a closer look at which function I saw
that allowed the specification of 'get' and 'post' methods tomorra.


rf

unread,
Aug 13, 2002, 4:26:27 AM8/13/02
to

"Rod Speed" <rod_...@yahoo.com> wrote in message
news:ajae1f$1a501q$1...@ID-69072.news.dfncis.de...

>
> rf <making...@the.time> wrote in message
> news:sm269.13984$Sy4....@news-server.bigpond.net.au...
> > Rod Speed <rod_...@yahoo.com> wrote
>
> >> What you are saying tho is that it obviously can be
> >> done and I'm asking for the fine detail on how to do it.
>
> > Yes. It can be done. Exactly what are you doing at the moment. You
mention
> > InternetOpenUrl but my documentation lists that as part of the Windows
CE API.
>
>
http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/winin
et/reference/functions/internetopenurl.asp
> says its a standard Win32 Internet function.

Yep, my bad. They rearranged the index a little and that page is now listed
under internetopenurl function, not internetopenurl.

> > I assume you have some VB running inside an Access database.
>
> Yep.
>
> > The C++ code I have runs to several pages and requires Visual Studio.
>
> OK, thanks for that. I'll have a closer look at which function I saw
> that allowed the specification of 'get' and 'post' methods tomorra.

Hmmm. Your original post did not indicate at all that you were going this
deep. This is the level I was talking about a couple of posts ago, way below
the web browser control and way, way below anything to do with HTML.

The parameter you need to supply is lpszHeaders, along with dwHeadersLength.
These are the headers I am talking about. Unfortunately the code I used is
buried in the archives somewhere. I'll dig it out in due course and post it.
With what you have it seems you only need to add a couple of lines of code.

Cheers
Richard.


Rod Speed

unread,
Aug 13, 2002, 5:33:26 AM8/13/02
to

rf <making...@the.time> wrote in message
news:To369.14409$Sy4....@news-server.bigpond.net.au...

> Rod Speed <rod_...@yahoo.com> wrote
>> rf <making...@the.time> wrote
>>> Rod Speed <rod_...@yahoo.com> wrote

>>>> What you are saying tho is that it obviously can be
>>>> done and I'm asking for the fine detail on how to do it.

>>> Yes. It can be done. Exactly what are you doing at the moment. You mention
>>> InternetOpenUrl but my documentation lists that as part of the Windows CE API.

> http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/winin
> et/reference/functions/internetopenurl.asp
>> says its a standard Win32 Internet function.

> Yep, my bad. They rearranged the index a little and that page
> is now listed under internetopenurl function, not internetopenurl.

>>> I assume you have some VB running inside an Access database.

>> Yep.

>>> The C++ code I have runs to several pages and requires Visual Studio.

>> OK, thanks for that. I'll have a closer look at which function I saw
>> that allowed the specification of 'get' and 'post' methods tomorra.

> Hmmm. Your original post did not indicate at all that you were going this deep.

Yeah, was a little cryptic when I said that I had been doing it for a
long time now when there was an explicit url to get the page I wanted.

> This is the level I was talking about a couple of
> posts ago, way below the web browser control

Yeah, I was basically hoping that it would be feasible to just cut to the chase
and drive the form using a control, doing what I would otherwise do manually.

> and way, way below anything to do with HTML.

Yeah, didnt spell that out too carefully.

> The parameter you need to supply is lpszHeaders, along with dwHeadersLength.
> These are the headers I am talking about. Unfortunately the code I used is
> buried in the archives somewhere. I'll dig it out in due course and post it.
> With what you have it seems you only need to add a couple of lines of code.

Yeah, looks like it. Thanks for that.


Starlandz

unread,
Aug 14, 2002, 6:07:21 AM8/14/02
to

Rod Speed

unread,
Aug 14, 2002, 7:17:02 PM8/14/02
to

rf <making...@the.time> wrote in message
news:To369.14409$Sy4....@news-server.bigpond.net.au...

> Rod Speed <rod_...@yahoo.com> wrote
>> rf <making...@the.time> wrote
>>> Rod Speed <rod_...@yahoo.com> wrote

>>>> What you are saying tho is that it obviously can be
>>>> done and I'm asking for the fine detail on how to do it.

>>> Yes. It can be done. Exactly what are you doing at the moment.

>>> I assume you have some VB running inside an Access database.

> The parameter you need to supply is lpszHeaders, along with dwHeadersLength.


> These are the headers I am talking about. Unfortunately the code I used is
> buried in the archives somewhere. I'll dig it out in due course and post it.
> With what you have it seems you only need to add a couple of lines of code.

Isnt this basically what I need ?
http://support.microsoft.com/default.aspx?scid=KB;EN-US;q175474&FR=0

It basically uses HttpOpenRequest to allow the specification of the POST
method instead of the default GET and just pumps the data in using lpszPostData


Rod Speed

unread,
Aug 16, 2002, 12:52:07 AM8/16/02
to

Rod Speed <rod_...@yahoo.com> wrote in message
news:ajeod9$1adr9e$1...@ID-69072.news.dfncis.de...

> rf <making...@the.time> wrote
>> Rod Speed <rod_...@yahoo.com> wrote
>>> rf <making...@the.time> wrote
>>>> Rod Speed <rod_...@yahoo.com> wrote

>>>>> Looking for a way to import web pages, into Access.

>>>>> Been doing this for a long time now and that works
>>>>> very well when an explicit url is available, like say
>>>>> http://www.sofcom.com.au/cgi-bin/TV/byChannel?date=Tuesday_13_August&chan=2&state=NSWReg&fta=1&fox=1&opt=1

>>>>> Trouble is that sometimes an explicit url isnt available, like with
>>>>> http://www.foxtel.com.au/foxtelguide

>>>>> How to I import that sort of page, say one for a specific channel on a specific date ?

> Isnt this basically what I need ?
> http://support.microsoft.com/default.aspx?scid=KB;EN-US;q175474&FR=0

> It basically uses HttpOpenRequest to allow the specification of the POST
> method instead of the default GET and just pumps the data in using lpszPostData

Cant get it to work currently. Someone wanna give me the specific data
to replace "yourserver" and for lpszPostData to get say today's program
guide for say the arena channel from that specific foxtel site/form ?

Just returns OK, even with crap in those two parameters.


Eric Booth

unread,
Aug 16, 2002, 1:54:51 AM8/16/02
to

"Rod Speed" <rod_...@yahoo.com> wrote in message
news:aji0dr$1b0g51$1...@ID-69072.news.dfncis.de...
Yes Rod I can give you the data if you ask nicely
>
>
>
>

Rod Speed

unread,
Aug 16, 2002, 9:19:59 PM8/16/02
to
Dont worry, fixed it. Works fine now.

"Rod Speed" <rod_...@yahoo.com> wrote in message news:aji0dr$1b0g51$1...@ID-69072.news.dfncis.de...

0 new messages