Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MSXML2.ServerXMLHTTP60 and Microsoft HTML Object Library

403 views
Skip to first unread message

The Frog

unread,
May 13, 2013, 8:40:02 AM5/13/13
to
Hi Everyone,

I am working on retrieving some data from webpages to use in my application and this is done by stripping the required contents from the web page in question. To retrieve the web page I am using the MSXML2.ServerXMLHTTP60 object (pretty easy) and getting a successful reponse. The responseText property seems to be spot on and contains the required data.

The next step is to pas the responseText contents to the Microsoft HTML Object Library. This is done by setting the body.innerHTML property to the contents of ReponseText. In theory this works, but in practice the HTML document object is not giving me any result at all!

My code is as follows:

Sub FetchPage2(url As String)

Dim oSvr As MSXML2.ServerXMLHTTP60
dim oDoc As HTMLDocument
Dim cycle As Date 'start time of the cycle

On Error GoTo FetchPage2_Error

Set oSvr = New MSXML2.ServerXMLHTTP60

oSvr.Open "GET", url, True 'establish the parameters
oSvr.send 'send the request off

cycle = Now()

While oSvr.ReadyState <> READYSTATE_COMPLETE 'value = 4
DoEvents

If cycle + TimeValue("00:00:30") < Now() Then 'try again after 30 seconds
oSvr.send 'by resending the request
cycle = Now()
End If
Wend

Set oDoc = New HTMLDocument
oDoc.body.innerHTML = oSvr.responseText 'this should work!


'do stuff here

ExitFetch:
Set oSvr = Nothing
Set doc = Nothing

Exit Function

FetchPage2_Error:

goto ExitFetch

End Sub

Here is the problem:
When I grab the HTML document from IE the parsing I need to do works. The document is well formed and you can use the elements in it. When I pass the contents as shown above the HTMLDocument shows nothing at all.

For example if I use IE to return a webpage and feed it to the HTMLDocument object like so:

oDoc = IE.Document

everything works nicely (eg/ I can get the document title from the oDoc object directly, but if I do the same from oSVR->oDoc instead of from IE there is no title element or any other for that matter). When I write the contents of the ResponseText from the oSvr object above to a file and then check it, it is good. If I load it to IE from file and then parse it it works. Something is not right here and I am probably missing something incredibly simple but I am stuck. Can anyone point out the error of my ways so that I can push the ResponseText directly to the HTMLDocument and parse it?

Cheers

The Frog

Douglas J Steele

unread,
May 13, 2013, 9:05:27 AM5/13/13
to
Don't create a new HTMLDocument (the line Set oDoc = New HTMLDocument). That
instantiation knows nothing about the document you just received from
sending the Get. Unfortunately, I don't have a sample I can check, but I'd
expect the oSvr object to have a property representing the returned
document, so you'd have Set oDoc = oSrv.HTMLDocument (or something like
that)

"The Frog" wrote in message
news:6d508dde-634b-4785...@googlegroups.com...

The Frog

unread,
May 13, 2013, 9:59:11 AM5/13/13
to
Hi Douglas,

I've had a crack at pushing the doc directly but none of the types
match. Or at least none that I can find. Further testing with the
innerhtml approach indicates that the document itself has come across
but none of the properties of the htmldocument object are set when
the push is done. For example you can use the getelementbyclass
method and obtain correct results. Check the title property of the
odoc and you get nada!

I had a thought regarding this, and perhaps I'm right off base here,
but here goes: the reason the other properties aren't being set is
that everything is being stuffed into the space that would represent
the <body></body> tags. The oDoc object isn't actually parsing what
it receives with the approach I have taken.

If there is a way to push the html directly to oDoc and populated the
object properly I would love to see it. I've beaten this thoroughly
for the last 12 hours and not found it either through research or
experimentation. It would save me refactoring a large chunk of
code...

I'll keep hammering at it...

Cheers

--
Cheers

The Frog

The Frog

unread,
May 14, 2013, 4:30:51 AM5/14/13
to
OK, found how its done, but its a little wierd. First of all it only works if the MSHTML.HTMLDocument is late bound. The method calls will not work with early binding at all. Second, to late bind the HTMLDocument you actually have to create a differently named object:
set myVar = CreateObject("HTMLFile")

But wait.....there's more!

After you have retrieved your HTML with the XMLServer code as detailed above and you can access the responsetext property you can then call a method on the HTMLDocument to correctly populate the object like so:

myVar.write(oSvr.responseText)

This does correctly populate the HTMLDocument object (now that its late bound) BUT the HTMLDocument object then wants to go and run all the scripts and download any images and so on..... It is ACTIVE. So the end result is that you are downloading twice. What an utter waste.

So, does anyone know how to turn the HTMLDocument object into 'offline' mode, no scripts running, etc... Or is there an alternative HTML parser that doesnt do all that extra running around?

Cheers

The Frog

The Frog

unread,
May 14, 2013, 10:11:00 AM5/14/13
to
Think I found the answer again. This is not just a one step with the
write method. Here is what seems to work:

oDoc.Open
oDoc.Write
oDoc.Close

It appears you don't need to reset the object between uses either as
what is 'written' seems to replace anything that was there before.
Hope this helps anyone else having to deal with html parsing.

--
Cheers

The Frog

Douglas J Steele

unread,
May 14, 2013, 5:06:41 PM5/14/13
to
Can't say I'm following what you're saying.

Can you show the code you're using now, not just 2 or 3 snippets from it?

"The Frog" wrote in message
news:almarsoft.6727...@news.aioe.org...

The Frog

unread,
May 15, 2013, 8:16:27 AM5/15/13
to
On Tue, 14 May 2013 17:06:41 -0400, "Douglas J Steele"
<NOSPAM_djsteele@NOSPAM_gmail.com> wrote:
> Can you show the code you're using now, not just 2 or 3 snippets
from it?

Hi Douglas,

Sure I can but it will have to wait till tomorrow. I'm out and about
right now and responding via my phone.

--
Cheers

The Frog
0 new messages