heavy problem with HTMLDocument

124 views
Skip to first unread message

pierre

unread,
Sep 11, 2003, 10:22:03 AM9/11/03
to
Hi, I got a problem which may easy to resolve, but I can't
find any issue:

I want to parse html files, so, I want first get it from an
url, and I do like that:

Dim objMSHTML As New mshtml.HTMLDocument()
Dim objDocument As mshtml.HTMLDocument
objDocument =
objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

normally, this should work and I could parse the html
code... but in fact, I got this error:

"Une exception non gérée du type
'System.NullReferenceException' s'est produite dans
mscorlib.dll
Informations supplémentaires : La référence d'objet n'est
pas définie à une instance d'un objet."

(sorry, my vb version is french)

any Idea?
PS: I think this code works with VB6...

une idée?

Patrick Steele [MVP]

unread,
Sep 11, 2003, 11:13:59 AM9/11/03
to
In article <002701c37870$128ebb00$a101...@phx.gbl>, prou...@ina.fr
says...

> Hi, I got a problem which may easy to resolve, but I can't
> find any issue:
>
> I want to parse html files, so, I want first get it from an
> url, and I do like that:
>
> Dim objMSHTML As New mshtml.HTMLDocument()
> Dim objDocument As mshtml.HTMLDocument
> objDocument =
> objMSHTML.createDocumentFromUrl("http://www.google.fr",
> vbNullString)

Use the built-in .NET networking objects. See:

http://tinyurl.com/98ey

--
Patrick Steele
Microsoft .NET MVP
http://weblogs.asp.net/psteele

pierre

unread,
Sep 11, 2003, 11:22:52 AM9/11/03
to
Thank you, Patrick
I've just read the article...
but it doesn't seems that it can help me to parse the
html... using mshtml.HTMLDocument, I though I could use the
"links" property which is supposed to give an access to
links in html...

>.
>

Cor

unread,
Sep 11, 2003, 11:52:09 AM9/11/03
to
Pierre,
I' never seen this methode, so I am curious if it works, but that is not in
one time.

I will advise you to take a look at the "webbrowser" with that you can
"navigate" to an URL
(It uses Internet explorer 6, don't ask me how)

Then with the "documentscomplete" events from the "webbrowser" you can get
the documents conform the dom.

When there is a frame's there is for every frame a document.
There is too a navigate-complete, but with that you get only the last page
downloaded

That's why I find the methode you use strange, but I saw it too in the
documentation

I hope I did bring you in the right direction.
It is to much to give a quick example.

And the webbrowser is only one of the methode's I think you can use, but
that I use for this things at the moment.

I hope it helps you a little bit.
Cor

Patrick Steele [MVP]

unread,
Sep 11, 2003, 12:07:07 PM9/11/03
to
In article <18d201c37878$91749090$a601...@phx.gbl>, prou...@ina.fr
says...

> Thank you, Patrick
> I've just read the article...
> but it doesn't seems that it can help me to parse the
> html... using mshtml.HTMLDocument, I though I could use the
> "links" property which is supposed to give an access to
> links in html...

Sorry -- forgot about your parsing issue.

Perhaps you could get the raw HTML using the .NET WebRequest and then
feed that into the mshtml.HTMLDocument object. I've never used that
object before so I'm not sure if you can load it with your own HTML.

Charles Law

unread,
Sep 11, 2003, 12:22:47 PM9/11/03
to
Hi Pierre

The problem is that although you create a new mshtml.HTMLDocument, it is not
being initialised.

Try the following:

<code>
Dim objMSHTML As New mshtml.HTMLDocument
Dim objDocument As mshtml.IHTMLDocument2
Dim ips As IPersistStreamInit

ips = DirectCast(objMSHTML, IPersistStreamInit)
ips.InitNew()

objDocument = objMSHTML.createDocumentFromUrl("http://www.google.fr",
vbNullString)

Do Until objDocument.readyState = "complete"
Application.DoEvents()
Loop

Debug.WriteLine(objDocument.body.outerHTML)
</code>

At the end of this you can access the DOM. Note that you need to define the
IPersistStreamInit interface.

HTH

Charles


"pierre" <prou...@ina.fr> wrote in message
news:002701c37870$128ebb00$a101...@phx.gbl...

Charles Law

unread,
Sep 11, 2003, 12:30:35 PM9/11/03
to
Pierre

In case you don't have it, here is the IPersistStreamInit interface
definition

<code>
Imports System.Runtime.InteropServices

<ComVisible(True), ComImport(),
Guid("7FD52380-4E07-101B-AE2D-08002B2EC713"), _
InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)> _
Public Interface IPersistStreamInit
' IPersist interface
Sub GetClassID(ByRef pClassID As Guid)

<PreserveSig()> Function IsDirty() As Integer
<PreserveSig()> Function Load(ByVal pstm As UCOMIStream) As Integer
<PreserveSig()> Function Save(ByVal pstm As UCOMIStream, ByVal
fClearDirty As Boolean) As Integer
<PreserveSig()> Function GetSizeMax(<InAttribute(), Out(),
MarshalAs(UnmanagedType.U8)> ByRef pcbSize As Long) As Integer
<PreserveSig()> Function InitNew() As Integer
End Interface
</code>

HTH

Charles


"Charles Law" <bl...@thingummy.com> wrote in message
news:%23l0B5DI...@TK2MSFTNGP11.phx.gbl...

Cor

unread,
Sep 11, 2003, 12:50:38 PM9/11/03
to
Charles,
Thanks, saves me a lot of time looking this up.
Cor


pierre

unread,
Sep 12, 2003, 5:05:01 AM9/12/03
to
Thanks a lot, it works perfectly :)
P.
>.
>
Reply all
Reply to author
Forward
0 new messages