Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

READYSTATE_COMPLETE Not Reached / OnDocumentComplete() with Frames

214 views
Skip to first unread message

michael....@gmail.com

unread,
Jul 28, 2007, 9:46:22 PM7/28/07
to
Greetings! I'm doing some development with a TWebBrowser/TEmbeddedWB
control and trying to detect the completion of a document. The problem
is that READYSTATE_COMPLETE doesn't always happen. After researching
this for several days, I'm not finding any solutions that work.

The component is visible. It ReadyState just doesn't always reach
READYSTATE_COMPLETE. Thus, I cannot do a loop of:

while (not WebBrowser.ReadyState = READYSTATE_COMPLETE) do
Application.ProcessMessages();

I've tried a couple tricks, in a nut shell:

procedure TDocumentParser.OnNavigateComplete2(ASender: TObject; const
pDisp: IDispatch; var URL: OleVariant);
begin
//if we don't already have the original dispatch set up, then add
//it so we can keep track of the top level
if (FOriginalDisp = nil) then
FOriginalDisp := pDisp;
end;

procedure TDocumentParser.OnDocumentComplete(ASender: TObject; const
pDisp: IDispatch; var URL: OleVariant);
var
if ((Assigned(FOriginalDisp)) and (pDisp = FOriginalDisp)) then
begin
//document is fully downloaded, received top level disp
ShowMessage('document fully loaded');
FOriginalDisp := nil;
end;
end;

This method does not work. Another suggested method was also in
comparing some pDisp values, like so:

procedure TDocumentParser.OnDocumentComplete(ASender: TObject; const
pDisp: IDispatch; var URL: OleVariant);
var
CurWebrowser: IWebBrowser;
TopWebBrowser: IWebBrowser;
Document: OleVariant;
WindowName: string;
begin
CurWebrowser := pDisp as IWebBrowser;
TopWebBrowser := (ASender as TEmbeddedWB).DefaultInterface;
if CurWebrowser = TopWebBrowser then
ShowMessage('Complete document was loaded')
else
begin
Document := CurWebrowser.Document;
WindowName := Document.ParentWindow.Name;
ShowMessage(Format('Frame "%s" was loaded', [WindowName]));
end;
end;

After researching this for quite a while, I'm seeing a lot of people
saying that the posted OnDocumentComplete() method posted above works.

I used a suggestion by M$ and for the OnDOWNLOADComplete() event, add
a message handle:

procedure TDocumentParser.OnDownloadComplete(Sender: TObject);
var
msg: TMsg;
begin
while (PeekMessage(msg, 0, 0, 0, PM_REMOVE)) do
begin
TranslateMessage(msg);
DispatchMessage(msg);
end;
end;

This had no effect. According to M$, it could be used in some cases
where ReadyState would not be READYSTATE_COMPLETE until -after- the
OnDocumentComplete() event was called.


Does anyone know of any actual method to detect when a document is
fully loaded? All frames, images, etc. Once the document is fully
loaded, only then can I begin doing my parsing and such on it.
Incomplete documents are worthless.


Running Vista Ultimate 64bit, MSIE 7.0.0600.16473 64bit. Suggestions?
I'll try pretty much anything at this point.

michael....@gmail.com

unread,
Jul 28, 2007, 11:26:20 PM7/28/07
to
Another interesting test, is that the "top level" frame is completing
before all the other frames do. I'm using http://www.cnn.com as a
test, as they have many IFRAMES going.

Event log:
Memo1
OnBeforeNavigate
finished navigate call
OnBeforeNavigate
OnDocumentComplete
OnBeforeNavigate
OnBeforeNavigate
OnBeforeNavigate
OnBeforeNavigate
OnBeforeNavigate
OnDocumentComplete
OnBeforeNavigate
OnDocumentComplete
OnDocumentComplete
OnDocumentComplete
OnDocumentComplete
OnDocumentComplete
OnDocumentComplete
OnBeforeNavigate
document not busy anymore
OnDocumentComplete
OnBeforeNavigate
OnBeforeNavigate
OnDocumentComplete
OnDocumentComplete
OnBeforeNavigate
OnDocumentComplete
OnBeforeNavigate
OnDocumentComplete

The message 'document not busy anymore' is in this procedure:

procedure TForm2.Button1Click(Sender: TObject);
begin
FDispList := TList.Create;
embeddedwb1.Navigate('www.cnn.com');
memo1.lines.add('finished navigate call');
while (embeddedwb1.ReadyState <> readystate_complete) do
begin
application.processmessages;
end;
memo1.lines.add('document not busy anymore');

end;


The other messages are in their associated events. Man, this entire
component and wrapper is a mess.

michael....@gmail.com

unread,
Jul 29, 2007, 4:47:22 AM7/29/07
to
Alright. I've been working on this with little to no sleep for days
now. After researching every article more than once, and converting
dozens of code snippets from other languages.. I've come up with a
class that will help in detecting completion of a document.

The problem:
ReadyState <> READYSTATE_COMPLETE in cases with multiple frames. For
example, http://www.cnn.com. In addition, the top level frame can
complete downloading and raise an OnDocumentComplete() event before
the entire page is actually downloaded and rendered. This means that
code relying on an entire document to be present doesn't quite work,
because the true document loaded state cannot be detected. In a lot of
cases, even without frames, READYSTATE_COMPLETE never occurs.

The solution:
Created a class that will track events passed to it, and return on a
time out. After calling a browser.Navigate() procedure, you can call
the function WaitForLoad() which will wait a specified time before
returning. Depending on the last events raised on the return, it will
give a good indication as to whether or not the document completed
loading. Each time you update a state (OnNavigateComplete,
OnDownloadBegin, OnDocumentComplete, etc) the timeout resets. It will
only return after a timeout passes of complete silence. I've also
added an event enumeration for "Protocol event", which can be used
when you are using a custom namespace/mime filter. In the case of
custom namespace/mime filter, you'd call UpdateEvent() with
deProtocolEvent on either the protocol Start(), Read(), or
Terminate(). This way, if it's just downloading a big file, you don't
return after a couple seconds and assume completion.

To use it, create an instance of the class:

[CODE]FBrowserDelay := TBrowserLoadDelay.Create(1000);[/CODE]

I've found that it tends to work with delays as little as 100ms.. the
more events you pass to it, the more efficient it is. This means a
custom namespace/mime filter passing Start(), Read(), and Terminate()
events to it would be prime.

For each event you want to update the timeout with, pass it through
the events with TEmbeddedWB. For example, each time TEmbeddedWB gets
an OnDocumentComplete() event:

[CODE]procedure TForm2.EmbeddedWB1DocumentComplete(ASender: TObject;


const pDisp: IDispatch; var URL: OleVariant);
begin

FBrowserDelay.UpdateEvent(deDocumentComplete);
end;[/CODE]


The values that you can pass to UpdateEvent() are:
deNoEvent - Default/Filler, more of a reserved value
deOtherEvent - Used with events not covered by other enumeratons
deBeforeNavigate - Used with an OnBeforeNavigate() event
deDocumentComplete - Used with an OnDocumentComplete() event
deDownloadBegin - Used with an OnDownloadBegin() event
deDownloadComplete - Used with an OnDownloadComplete() event
deProgressChange - Used with an OnProgressChange() event
deNewWindow2 - Used with an OnNewWindow2() event
deNewWindow3 - Used with an OnNewWindow3() event
deProtocolEvent - Used with IInternetProtocol's Start(), Read(), and
Terminate() events.


If anyone uses this, I'd appreciate if you post the credits in the
unit as well. This really did take a lot of time to research the
problem and seek solutions. After finding nothing, I went ahead and
wrote this class and tested it. I'll look at coding a class that
registers it with a TEmbeddedWB, so you only need to register your own
callbacks with the class.

As for using the class, whenever you need to wait for a document to
finish loading, do something like this:

[CODE] EmbeddedWB1.Navigate('www.cnn.com');

if (LoadDelay.WaitForLoad() = drDocumentComplete) then
Memo1.Lines.Add('Detected drDocumentComplete')
else
Memo1.Lines.Add('finished on timeout');[/CODE]

The same time has passed between drDocumentComplete and anything else.
The result of WaitForLoad() is either drDocumentComplete or drTimeout.
When the result is drDocumentComplete, it means that the last event
raised was a drDocumentComplete event. When the result is drTimeout,
it means that the last event was something other than an
OnDocumentComplete() event and it appears progress has stopped, and
effectively timed out. You could optionally do something like:
[CODE]
var
myEvent: TDelayResult;
begin
EmbeddedWB1.Navigate('www.cnn.com');
for x := 0 to 4 do
begin
myEvent := LoadDelay.WaitForLoad();
if (myEvent = drDocumentComplete) then
Break;
end;
[/CODE]
This would allow you to not delay for too long (500ms or so), and if
the document doesn't return as being loaded in that time, then
continue waiting even longer for a response. If right off the bat
you're waiting for 4000ms, it's going to wait that every time.. even
if the document only took 20ms to load. The for() example is most
useful when you're calling Navigate() in a loop.


The unit:

[CODE]{
8balltechnology.com
Michael Martinek, mmar...@8balltechnology.com
1:42 AM 7/29/2007

This class provides a more accurate method of detecting document
load
completion than monitoring raw events. Problems arise when simply
trying to
compare pDisp values in OnDocumentComplete() events to those
reserved and
assigned to variables via OnNavigateComplete(), or any other events.
This
class provides a method of monitoring events and returning when the
browser
interface appears to have stopped doing any sort of work. It
functions
primariy off event updates, and should be updated with as many
events as
possible. The more events it receives, the more efficient it is.
8balltechnology.com and myself do not warrant the effectiveness or
usefulness
of this code in any way. We also do not warrant the success you may
have in
implementing it, nor do we warrant the stability or security. Usage
of this
code is at your own risk; we accept no responsibility.
}

unit uBrowserDelay;

interface

uses
Windows, Forms;

type
TDelayResult = (drNone, drTimeout, drDocumentComplete);
TBrowserDelayEvent = (deNoEvent, deOtherEvent, deBeforeNavigate,
deDocumentComplete, deDownloadBegin, deDownloadComplete,
deProgressChange,
deNewWindow2, deNewWindow3, deProtocolEvent);

TBrowserLoadDelay = class(TObject)
private
FTickStart: Cardinal;
FLastEvent: TBrowserDelayEvent;
FTimeout, FWaitedMS: Cardinal;
public
procedure UpdateEvent(const AEvent: TBrowserDelayEvent);
function WaitForLoad(): TDelayResult;

constructor Create(const ATimeoutInMS: Cardinal);
destructor Destroy();

property Timeout: Cardinal read FTimeout write
FTimeout;
property WaitedMS: Cardinal read FWaitedMS;
end;

implementation

constructor TBrowserLoadDelay.Create(const ATimeoutInMS: Cardinal);
begin
inherited Create();

//set default timeout to passed value
FTimeout := ATimeoutInMS;

//configured to a "No event" value
FLastEvent := deNoEvent;

//tick start will get updated when WaitForLoad() is called, but it
just
//feels better to initialize everything. ;)
FTickStart := 0;
end;

destructor TBrowserLoadDelay.Destroy();
begin
inherited Destroy();
end;

procedure TBrowserLoadDelay.UpdateEvent(const AEvent:
TBrowserDelayEvent);
begin
//when updating event, update the "start" timer so we can restart
the
//entry point of the delay
if (AEvent <> FLastEvent) then
begin
FTickStart := GetTickCount();
FWaitedMS := 0;
end;

//update the event now
FLastEvent := AEvent;
end;

function TBrowserLoadDelay.WaitForLoad(): TDelayResult;
begin
//start the TickStart timer to when we started waiting.
FWaitedMS := 0;
FTickStart := GetTickCount();

//repeat until we've hit a time-out, then from that point figure out
what
//to do from there
while (FWaitedMS < FTimeout) do
begin
//prevent application hangs
Application.ProcessMessages();

//waited time in milliseconds is current tick minus the tick
start.
//this tick start can be updated real-time via UpdateEvent().
FWaitedMS := GetTickCount() - FTickStart;
end;

//if we timed out, then set the result accordingly. If we timed out
and the
//last event is a NavComplete, then we can assume that the
navigation
//finished properly
if (FLastEvent = deDocumentComplete) then
Result := drDocumentComplete
else
Result := drTimeout;

//update to 0 for property-sake.. when we return from this function,
we have
//always waited FTimeout milliseconds, so we do not need to use
FWaitedMS to
//find out how long we've been waiting.
FWaitedMS := 0;
end;

end.

[/CODE]


I posted this elsewhere; just sharing it in case any other developers
out there are needing something like this as well. If you still have
any input to this problem, I'm very interested to hear it.

0 new messages