Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How do you get the Document HTML Source?

5 views
Skip to first unread message

Andre Fontaine

unread,
Jul 6, 1998, 3:00:00 AM7/6/98
to

Ok, I know you can get the OuterHTML from the Document.Body
object, but that does not always return the complete HTML source
as downloaded to the browser. I have been unable to find a way
to get similar functionality as the <right click> view source IE popup
menu. Does anyone know a way to get the view source functionality
programmatically? Thanks,

-Andre


David Roe

unread,
Jul 7, 1998, 3:00:00 AM7/7/98
to

On Mon, 6 Jul 1998 14:33:04 -0400, "Andre Fontaine"
<afon...@highground.com> wrote:

>Does anyone know a way to get the view source functionality
>programmatically? Thanks,

http://www.codeguru.com/internet/view_source.shtml

this will launch it into Notepad a la IE.
if you want access to the HTML from within your code, you may want to
check out get_body() from the IHTMLDocument2 interface.

rgds,
/dave

--------------
David Roe
Focus Consulting
da...@focus-consulting.co.uk
d...@sydney.net

David Roe

unread,
Jul 7, 1998, 3:00:00 AM7/7/98
to

On Mon, 6 Jul 1998 14:33:04 -0400, "Andre Fontaine"
<afon...@highground.com> wrote:

> Does anyone know a way to get the view source functionality
>programmatically? Thanks,

actually, just looked at the code at that web page and it's not the
best way to do it (IMHO).

[fx: rummages around hard disk]

try:

ExecCmdTarget(pWebWnd,&CGID_IWebBrowser,CWBCmdGroup::HTMLID_VIEWSOURCE);

HRESULT ExecCmdTarget(CWnd *pWebWnd, const GUID *pguidCmdGroup, DWORD
nCmdID)
{
LPDISPATCH lpDispatch = NULL;
LPOLECOMMANDTARGET lpOleCommandTarget = NULL;
HRESULT hResult = E_FAIL;

lpDispatch = pWebWnd->m_pBrowser->GetDocument();
if (lpDispatch) {
hResult =
lpDispatch->QueryInterface(IID_IOleCommandTarget,(void **)
&lpOleCommandTarget);
if (SUCCEEDED(hResult)) {
hResult =
lpOleCommandTarget->Exec(pguidCmdGroup,nCmdID,0,NULL,NULL);
lpOleCommandTarget->Release();
}
lpDispatch->Release();
}
return hResult;

Andre Fontaine

unread,
Jul 7, 1998, 3:00:00 AM7/7/98
to
As I stated in my original message, the Body does not always
contain the complete HTML source, only what's in between
the body tags. I wonder how IE gets it when you do a view
source?

David Roe wrote in message <35a1a46b...@msnews.microsoft.com>...


>
>http://www.codeguru.com/internet/view_source.shtml
>
>this will launch it into Notepad a la IE.
>if you want access to the HTML from within your code, you may want to
>check out get_body() from the IHTMLDocument2 interface.
>

Jim Thatcher

unread,
Jul 7, 1998, 3:00:00 AM7/7/98
to
Ahhh, but there is the rub, I don't wanna do it in C++, I wanna do it in VB.
I have been workin g on this for a whiloe and have come up with a few work
arounds, one being saving the file and opening it back up in a textbox or
something like that, if any one is interested on how to do that, lemme know,
if you wanna figure it out on your own, it is the webbrowser1.execWB sub,
and there are some OLEID const's that you can fgiure out with the object
browser, but if you have trouble, lemme know

So my question still remains, if anyone knows how to find the damn source in
HTML... etc. etc.
David Roe wrote in message <35a1a4c7...@msnews.microsoft.com>...

Henri Fournier

unread,
Jul 8, 1998, 3:00:00 AM7/8/98
to
Here's some Delphi code to do the IE View Source menu option:

const
HTMLID_FIND = 1;
HTMLID_VIEWSOURCE = 2;
HTMLID_OPTIONS = 3;


procedure TMainForm.DocumentSource1Click(Sender: TObject);
begin
InvokeIE(HTMLID_VIEWSOURCE);
end;


procedure TMainForm.InvokeIE(Value: Integer);
const
CGID_WebBrowser: TGUID = '{ED016940-BD5B-11cf-BA4E-00C04FD70816}';
var
CmdTarget : IOleCommandTarget;
vaIn, vaOut: OleVariant;
PtrGUID: PGUID;
begin
New(PtrGUID);
PtrGUID^ := CGID_WebBrowser;
if WebBrowser1.Document <> nil then
try
WebBrowser1.Document.QueryInterface(IOleCommandTarget, CmdTarget);
if CmdTarget <> nil then
try
CmdTarget.Exec( PtrGUID, Value, 0, vaIn, vaOut);
finally
CmdTarget._Release;
end;
except
// Nothing
end;
Dispose(PtrGUID);
end;


--

Henri Fournier
http://www.globalserve.net/~hfournier

Andre Fontaine wrote in message
<#G0JByQq...@uppssnewspub05.moswest.msn.net>...


>Ok, I know you can get the OuterHTML from the Document.Body
>object, but that does not always return the complete HTML source
>as downloaded to the browser. I have been unable to find a way
>to get similar functionality as the <right click> view source IE popup

>menu. Does anyone know a way to get the view source functionality
>programmatically? Thanks,
>
>-Andre
>
>
>

AJ Stiles

unread,
Jul 8, 1998, 3:00:00 AM7/8/98
to
Check out this link: <a href="http://www.sea-
glass.com/browserhelper.htm">http://www.sea-glass.com/browserhelper.htm</a>.
It makes it easy to call the Find, View Source, and Internet Options dialog
from VB. Its free for non-commercial use.

AJ

In article <u55q7Pc...@uppssnewspub05.moswest.msn.net>,


"Jim Thatcher" <Jim_Th...@vantive.com> wrote:
> Ahhh, but there is the rub, I don't wanna do it in C++, I wanna do it in VB.
> I have been workin g on this for a whiloe and have come up with a few work
> arounds, one being saving the file and opening it back up in a textbox or
> something like that, if any one is interested on how to do that, lemme know,
> if you wanna figure it out on your own, it is the webbrowser1.execWB sub,
> and there are some OLEID const's that you can fgiure out with the object
> browser, but if you have trouble, lemme know
>
> So my question still remains, if anyone knows how to find the damn source in
> HTML... etc. etc.
> David Roe wrote in message <35a1a4c7...@msnews.microsoft.com>...
> >On Mon, 6 Jul 1998 14:33:04 -0400, "Andre Fontaine"
> ><afon...@highground.com> wrote:
> >

> >> Does anyone know a way to get the view source functionality
> >>programmatically? Thanks,
> >

&#137;

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum

Andre Fontaine

unread,
Jul 8, 1998, 3:00:00 AM7/8/98
to
Another explanation:
The HTMLDocument object does not have a method or
property for getting the complete HTML source. You can
get the HTML for the body of the document, but it is not
complete. I want the functionality similar to the View Source
menu, BUT I DON'T WANT TO EXECUTE THE VIEW SOURCE
MENU. I want to obtain the HTML programatically. All of the
solutions that have been submitted deal with executing the View
source menu, which pops up the source in notepad. It can be
done this way (Get the handle to notepad, copy the text out, kill
the window etc.), but it's not very clean. There must be a better way.

AJ Stiles wrote in message <6o02c2$33a$1...@nnrp1.dejanews.com>...

Jack Bell

unread,
Jul 9, 1998, 3:00:00 AM7/9/98
to
In article <u7szQup...@uppssnewspub05.moswest.msn.net>,
afon...@highground.com says...

> Another explanation:
> The HTMLDocument object does not have a method or
> property for getting the complete HTML source. You can
> get the HTML for the body of the document, but it is not
> complete. I want the functionality similar to the View Source
> menu, BUT I DON'T WANT TO EXECUTE THE VIEW SOURCE
> MENU. I want to obtain the HTML programatically. All of the
> solutions that have been submitted deal with executing the View
> source menu, which pops up the source in notepad. It can be
> done this way (Get the handle to notepad, copy the text out, kill
> the window etc.), but it's not very clean. There must be a better way.

<SNIP>

Have you considered simply doing an HTTP Get for the current URL? It
would be very fast because the document would already be in cache...

Jack

Patrice Godard

unread,
Jul 23, 1998, 3:00:00 AM7/23/98
to

Jack Bell a écrit dans le message ...

>In article <u7szQup...@uppssnewspub05.moswest.msn.net>,
>afon...@highground.com says...
>> Another explanation:
>> The HTMLDocument object does not have a method or
>> property for getting the complete HTML source. You can
>> get the HTML for the body of the document, but it is not
>> complete. I want the functionality similar to the View Source
>> menu, BUT I DON'T WANT TO EXECUTE THE VIEW SOURCE
>> MENU. I want to obtain the HTML programatically. All of the
>> solutions that have been submitted deal with executing the View
>> source menu, which pops up the source in notepad. It can be
>> done this way (Get the handle to notepad, copy the text out, kill
>> the window etc.), but it's not very clean. There must be a better way.
>

I did it by querying the web browser the current URL.
Then I used the wininet API to retrieve the corresponding file in the cache.

Here it is:

BOOL getLocalFileNameFromURL(LPCSTR URL,CString* fileName)
{
LPINTERNET_CACHE_ENTRY_INFO pCacheInfo;
DWORD size;

size = MAX_CACHE_ENTRY_INFO_SIZE;
pCacheInfo = (LPINTERNET_CACHE_ENTRY_INFO) malloc(size);

RetrieveUrlCacheEntryFile(URL,pCacheInfo,&size,0);
if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
realloc(pCacheInfo,size);
RetrieveUrlCacheEntryFile(URL,pCacheInfo,&size,0);
}
if (GetLastError() == ERROR_FILE_NOT_FOUND) {
TRACE("## ERROR_FILE_NOT_FOUND\n");
free(pCacheInfo);
return FALSE;
}
else if (GetLastError() !=0) {
ErrorOut (GetLastError (), "GetUrlCacheEntryInfo");
free(pCacheInfo);
return FALSE;
}
else {
TRACE("### URL: %s\nFichier Local:
%s\n###\n",pCacheInfo->lpszSourceUrlName,pCacheInfo->lpszLocalFileName);
*fileName = pCacheInfo->lpszLocalFileName;
}
free(pCacheInfo);
return TRUE;
}


/***************************************************************************
*
*
* FUNCTION: ErrorOut
*
* PURPOSE: This function is used to get extended Internet error.
*
* COMMENTS: Function returns TRUE on success and FALSE on failure.
*
****************************************************************************
/

BOOL CEventSink::ErrorOut ( DWORD dError, TCHAR * szCallFunc)
{
TCHAR szTemp[100] = "", *szBuffer=NULL, *szBufferFinal = NULL;
DWORD dwIntError , dwLength = 0;
wsprintf (szTemp, "%s error %d\n ", szCallFunc, dError );
if (dError == ERROR_INTERNET_EXTENDED_ERROR)
{
InternetGetLastResponseInfo (&dwIntError, NULL, &dwLength);
if (dwLength)
{
if ( !(szBuffer = (TCHAR *) LocalAlloc ( LPTR,
dwLength) ) )
{
lstrcat (szTemp, TEXT ( "Unable to allocate
memory to display Internet error code. Error code: ") );
lstrcat (szTemp, TEXT (_itoa
(GetLastError(), szBuffer, 10) ) );
lstrcat (szTemp, TEXT ("\n") );
AfxMessageBox (szTemp);
return FALSE;
}
if (!InternetGetLastResponseInfo (&dwIntError,
(LPTSTR) szBuffer, &dwLength))
{
lstrcat (szTemp, TEXT ( "Unable to get
Intrnet error. Error code: ") );
lstrcat (szTemp, TEXT (_itoa
(GetLastError(), szBuffer, 10) ) );
lstrcat (szTemp, TEXT ("\n") );
AfxMessageBox (szTemp);
return FALSE;
}
if ( !(szBufferFinal = (TCHAR *) LocalAlloc ( LPTR,
(strlen (szBuffer) +strlen (szTemp) + 1) ) ) )
{
lstrcat (szTemp, TEXT ( "Unable to allocate
memory. Error code: ") );
lstrcat (szTemp, TEXT (_itoa
(GetLastError(), szBuffer, 10) ) );
lstrcat (szTemp, TEXT ("\n") );
AfxMessageBox (szTemp);
return FALSE;
}
lstrcpy (szBufferFinal, szTemp);
lstrcat (szBufferFinal, szBuffer);
LocalFree (szBuffer);
AfxMessageBox (szBufferFinal);
LocalFree (szBufferFinal);
}
}
/* else
AfxMessageBox (szTemp);
*/
return TRUE;

}

0 new messages