Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

TWebBrowser : How i can read the HTML page ( source ) ?

596 views
Skip to first unread message

Ruggero Cross

unread,
Oct 25, 2005, 5:26:34 AM10/25/05
to
Hi

I have a TWebBrowser component open on html page.

How i can read the HTML page ( source ) ?


tnx.

Mike Meyer

unread,
Oct 25, 2005, 8:15:28 AM10/25/05
to
Copy/paste from Delphi Knowledge Base

http://www.baltsoft.com/product_dkb.htm

----------------------

Problem/Question/Abstract:


Using the TWebBrowser component (Delphi 5), I am looking for a way to store
the HTML code of the TWebBrowser. When I use the right mouse button, I can
store the HTML code, but I would like to do this programmatically.


Answer:


uses

ActiveX;


{Saves the HTML document - referenced through 'Document' - to a stream}


procedure SaveDocumentSourceToStream(Document: IDispatch; Stream: TStream);

var

PersistStreamInit: IPersistStreamInit;

StreamAdapter: IStream;

begin

{Delete content of stream}

Stream.Size := 0;

Stream.Position := 0;

{IPersistStreamInit - get document interface}

if Document.QueryInterface(IPersistStreamInit, PersistStreamInit) = S_OK
then

begin

{Use StreamAdapter to get the IStream interface for our stream}

StreamAdapter := TStreamAdapter.Create(Stream, soReference);

{Save data from document into stream}

PersistStreamInit.Save(StreamAdapter, False);

{Destroy StreamAdapter. Optional.}

StreamAdapter := nil;

end;

end;

-------------------

Problem/Question/Abstract:


How to save raw HTML source from TWebBrowser.Document to disk


Answer:


Solve 1:


TWebBrowser.Document implements IPersistStreamInit which exposes Save()
method. All you need to know is how to use this method along with given
object which implements IStream. We could simply use TStreamAdapter for this
purpose.


Note that IPersistStreamInit and IStream interfaces are declared inside
ActiveX unit.


Here's how to do it.


uses ActiveX...

{...}

procedure TForm1.SaveHTMLSourceToFile(const FileName: string;

WB: TWebBrowser);

var

PersistStream: IPersistStreamInit;

FileStream: TFileStream;

Stream: IStream;

SaveResult: HRESULT;

begin

PersistStream := WB.Document as IPersistStreamInit;

FileStream := TFileStream.Create(FileName, fmCreate);

try

Stream := TStreamAdapter.Create(FileStream, soReference) as IStream;

SaveResult := PersistStream.Save(Stream, True);

if FAILED(SaveResult) then

MessageBox(Handle, 'Fail to save HTML source', 'Error', 0);

finally

{ we are passing soReference in TStreamAdapter constructor,

it is our responsibility to destroy the TFileStream object. }

FileStream.Free;

end;

end;


procedure TForm1.Button1Click(Sender: TObject);

begin

if SaveDialog1.Execute then

SaveHTMLSourceToFile(SaveDialog1.FileName, WebBrowser1);

end;


Here's the snippet code to navigate to password protected URL. The
authorization type is Basic. You can search the internet for Base64 encode
routine. There is plenty of it. I think this issue is beyond the article
topic.


Actually, you can embed the Authorization info in the URL string in form
http://<user>:<password>@<hostname>/ and IE will automatically put it in the
request header for you.


procedure TForm1.Button1Click(Sender: TObject);

var

URL, Flags, TargetFrameName, PostData,

Headers: OleVariant;

begin

// EdURL, EdPassword, and EdUserName is TEdit control

URL := EdURL.Text;

Flags := EmptyParam;

TargetFrameName := EmptyParam;

PostData := EmptyParam;

if (EdUserName.Text <> '') and (EdPassword.Text <> '') then

Headers := 'Authorization: Basic ' +

Base64Encode(EdUserName.Text + ':' + EdPassword.Text)

else

Headers := EmptyParam;

WebBrowser1.Navigate2(URL, Flags, TargetFrameName, PostData,

Headers);

end;

--------------------------

Problem/Question/Abstract:


How can I access the HTML content of a TWebBrowser object? I tried to use
OLECMDID_SAVEAS to save it as a file first and then access it afterwards.
But it always asks for the directory and file name for this file.


Answer:


Use the following function to store the HTML source code to a string (e.g. a
TStringStream):


procedure SaveDocumentSourceToStream(Document: IDispatch; Stream: TStream);

var

PersistStreamInit: IPersistStreamInit;

StreamAdapter: IStream;

begin

{Delete stream content}

Stream.Size := 0;

Stream.Position := 0;

{IPersistStreamInit - get document interface}

if Document.QueryInterface(IPersistStreamInit, PersistStreamInit) = S_OK
then

begin

{Use stream adapter to get the IStream Interface to our stream}

StreamAdapter := TStreamAdapter.Create(Stream, soReference);

{Save data from document into stream}

PersistStreamInit.Save(StreamAdapter, False);

{Destroy stream adapter. Optional, as it would happen anyway}

StreamAdapter := nil;

end;

end;

"Ruggero Cross" <Rugge...@yahoo.com> wrote in message
news:435d...@newsgroups.borland.com...

Eddie Shipman

unread,
Oct 25, 2005, 10:46:07 AM10/25/05
to
In article <435d...@newsgroups.borland.com>, Rugge...@yahoo.com
says...

> Hi
>
> I have a TWebBrowser component open on html page.
>
> How i can read the HTML page ( source ) ?
>

Even easier:

uses ..., mshtml; // if < D6, import MS HTML TypeLib and use
// file created.

procedure TForm1.WebBrowser1DocumentComplete(...);
var
doc: IHTMLDocument2;
begin
doc := WebBrowser1.Document as IHTMLDocument2;
Memo1.Lines.Text := doc.Body.OuterHTML;
end;

0 new messages