Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

WebBrowser Control (or CHtmlView): how to get the content? PLEASE HELP!

365 views
Skip to first unread message

Bernhard Baeumle

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Hi,

I have a WebBrowser Control, I call Navigate() but now I need to get the
CONTENT of the page into a string or structure!

I thought the function get_document() would do that, but I really don't have
any Idea (or see any help-page) about
what I should do with the "dispatch-ptr" returned....

Anyone could help me PLEASE!!!!!

Bernhard.

B.Ba...@gmx.net


saln...@my-deja.com

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to

>I need to get the CONTENT of the page into a string or structure!

There are two properties
WebBrowser1.Document.body.innerHTML
WebBrowser1.Document.body.innerText

for tagged HTML and plain texts of a content of a page


Sent via Deja.com http://www.deja.com/
Before you buy.

Jeff Partch

unread,
Oct 4, 1999, 3:00:00 AM10/4/99
to
Hi, Bernhard!

>what I should do with the "dispatch-ptr" returned....

QueryInterface on it for an IHTMLDocument2 (maybe even IHTMLDocument3 these
days) interface, then see if one of the many methods it provides will help
you accomplish your goal. Remember too, that the LPDISPATCH returned by
get_Document() is AddRef()'d before you get it and you'll need to Release it
when you're done with it.

HTH,

Jeff...


Dave Johnston

unread,
Dec 6, 1999, 3:00:00 AM12/6/99
to
Hi all,

"Jeff Partch" <airb...@airmail.net> wrote in message
news:OFVPkJmD$GA....@cppssbbsa02.microsoft.com...

I had the same problem, and it seems to be a common one, so I thought I'd
butt in with a working example.

The best solution would be the IHTMLDocument3::GetDocumentElement, get the
IHTMLElement interface of the returned element, then call the get_outerHTML
method of the element... except I can't find the matching mshtml.h & .lib to
go with the latest MSDN documentation of IHTMLDocument3. I therefore had to
resort to going via a slower Invoke on the IDispatch interface.

Note: IHTMLDocument3::GetDocumentElement returns an IDispatch interface for
the very root node of the document, and not just the body elements.

My MFC solution is shown below. I welcome suggestions for cleaning it up as
I am new to COM & OLE. I had to use the OLEview utility to find the magic
dispatch ID (0x433) for IHTMLDocument3::GetDocumentElement. It takes the
resulting BasicSTRing and passes it to the document for storage via the
SetSourceHTML() method. Instead you can do whatever you like with the BSTR
full of HTML that is returned.

Cheers,
Dave

void CHtmlViewer::OnDocumentComplete(LPCTSTR lpszUrl)
{
HRESULT hr;

LPDISPATCH doc = GetHtmlDocument();

// Invoke GetDocumentElement member directly via IDispatch of document
//
DISPPARAMS dispparams;
memset(&dispparams, 0, sizeof dispparams);
VARIANT* pvarResult = NULL;
VARIANT vaResult;
pvarResult = &vaResult;
doc->Invoke(
0x433,
IID_NULL,
0,
DISPATCH_PROPERTYGET,
&dispparams,
pvarResult,
NULL,
NULL);

if (vaResult.vt==VT_DISPATCH)
{
LPDISPATCH pDisp = vaResult.pdispVal;
// Get the element interface
IHTMLElement* pElem = NULL;
if (SUCCEEDED(hr = pDisp->QueryInterface( IID_IHTMLElement,
(LPVOID*)&pElem )))
{
BSTR str;
_bstr_t strTag;
pElem->get_tagName( &str );
strTag = str;

pElem->get_outerHTML( &str );
strTag = str;
((CHtmlDoc *)GetDocument())->SetSourceHTML( this, strTag );
pElem->Release();
} // QI(IHTMLElement)
pDisp->Release();
}
}

0 new messages