I'm writing a IE BHO which mimics the AutoCopy FireFox addin. What I though
would be a 1 day job (initially so) has now turned out to be a cat and mouse
game with the IE object model :)
The idea was very simple, I followed the instructions in Q180366
(http://support.microsoft.com/kb/q180366/) , so that I could get the last
document complete event fired for a page, enum all the frame(s), starting
with the IHTMLDocument2->Document, recursing using The HTMLFramesCollection
for each document object blah blah and registering the connection points for
document events like ONSELECTIONCHANGE, ONMOUSEDOWN. I store all the document
objects (e.g. CComPtr<IHTMLDocument2>) in a vector (dunno if this is a good
idea? Certainly is faster than recursing all the frames each time an event
occurs??), so that whenever I get a ONSELECTIONCHANGE event from a document,
I scan all the document objects in the vector, looking for a valid selection
and copy it to the clipboard, workes fine.
So far so good, for almost all pages that I tested works fine, with or
without frames (even gmail pages, which used frames, dynamic pages
extensively). And then I came across the GeoCities and YahooMail page :)
This basically broke the suggestion in Q180366, because I get document
complete from the page(s) even after getting the document complete with the
IDispatch of the webbrowser control! Hence I can't really determine whether
the page has actually loaded (using the ready state seems to be a bit
foolish, since I may have frames that are auto refreshing or broken or may
not load).
So what happens is that, since I clear my vector of documents and unadvise
the connection points when I think the document has loaded, it won't work
with the GeoCities pages, since my BHO won't be aware of the new document
completes :(
Here's a test page and a dump of the Navigate Complete and Document Complete
events when I load the page:
test page: (any geo cities oy yahoo mail page will do!)
http://www.geocities.com/i_adryan/
Navigate complete: 0016758C <- IDispatch of the webbrowser control
Navigate complete: 015DDE90
Document complete: 015CBA50
Document complete: 015DDE90
Document complete: 015CFA60
Document complete: 015CDA70
Document complete: 0016758C <- IDispatch of the webbrowser control
Navigate complete: 015CBA50 <- Opps! new documents coming in!
Navigate complete: 015F5A50
Document complete: 015F5A50
Navigate complete: 015F7B60
Document complete: 015F7B60
Document complete: 015CBA50
Any ideas?
Cheers,
T
For recursing into frames, it is best to use KB Article KB196340 "HOWTO:
Get the WebBrowser Object Model of an
HTML Frame". This technique bypasses cross-domain scripting
restrictions.
> So far so good, for almost all pages that I tested works fine, with or
> without frames (even gmail pages, which used frames, dynamic pages
> extensively). And then I came across the GeoCities and YahooMail page
> :)
>
> This basically broke the suggestion in Q180366, because I get document
> complete from the page(s) even after getting the document complete
> with the IDispatch of the webbrowser control! Hence I can't really
> determine whether the page has actually loaded (using the ready state
> seems to be a bit foolish, since I may have frames that are auto
> refreshing or broken or may not load).
This is what I do, instead of walking the frame hierarchy. I handle
BeforeNavigate2, NavigateComplete2 and DocumentComplete at the top-level
browser object. These events are fired individually by each frame, and
the first parameter of each event indicates which frame it has come
from.
I maintain a list of every frame I have seen so far, and each frame
(except the top-level object of course) refers to its parent frame. To
get a parent of a frame, call IWebBrowser2::get_Parent (you will get a
document object of the parent frame), query for IServiceProvider2 and
call QueryService(SID_SWebBrowserApp).
Now, whenever I receive BeforeNavigate2, I check to see if this is from
a known frame, and if not, I add it to the list with an appropriate
parent reference (it is impossible for BeforeNavigate2 from a child
frame to arrive before BeforeNavigate2 of the parent, so the parent is
always in the list already).
Whenever I receive NavigateComplete2 from a frame, I assume that all its
children and descendant frames are destroyed (since the old document was
unloaded and a new document is just being loaded), so I remove them from
the list. Also, at this point window and document objects of the frame
that fired NavigateComplete2 are available, so you can attach event
sinks at the document level (but not yet at individual elements' level).
When I receive DocumentComplete, I know that this particular frame has
finished loading. This doesn't mean that its child frames have - they
might have already initiated a new download. But my system does not rely
on events from child frames being strictly nested. Anyway, when
DocumentComplete fires, you can safely walk DHTML model of this frame
and attach sinks at element level if you are so inclined.
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
Many thanks for the kind reply. Interestingly enough, I did come across
KB196340, after making this post, which after integrating into my code solved
the problem partially, e.g. now the GeoCities pages work :). However, the
Yahoo pages are still a problem.
I'll hopefully try to integrate your kind suggestions and see what happens.
Best regards,
T
Another really silly question... the DISPPARAMS* passed in ::Invoke, its
first parameter, e.g. DISPPARAMS->rgvarg[1].pdispVal, what does it point to?
I mean I tried assigning ti to a IHTMLDocument2 and IHTMWindow2 dispatch
ptr's (when DISPID_BEFORENAVIGATE2 is fired) but in both cases I get an
access violation, what's the actual type for DISPPARAMS->rgvarg[1].pdispVal
in this case? The browser, document, window or something else?
Cheers,
T
When you receive BeforeNavigate there is no HTML window or
document yet. DISPPARAMS->rgvarg[1].pdispVal is a pointer to
the IDispatch interface for the WebBrowser object that
represents the window or frame.
if (DISPID_BEFORENAVIGATE2 == dispidMember)
{
ATLTRACE(_T("Before Navigate: %p\n"), pDispParams->rgvarg[1].pdispVal);
LPDISPATCH pDispatch = pDispParams->rgvarg[1].pdispVal;
CComQIPtr<IWebBrowser2, &IID_IWebBrowser2> pBrowser(pDispatch); // Access
violation here, since QueryInterface fails
}
Any idea?
Cheers,
T
I'm not using raw IDispatch with events. I inherit my class
from IDispEventImpl, so event handlers have appropriate
types and correct number of parameters. According to MSDN
first parameter is pointer to WebBrowser's IDispatch:
"DWebBrowserEvents2::BeforeNavigate2 Event"
http://msdn.microsoft.com/workshop/browser/webbrowser/refere
nce/ifaces/dwebbrowserevents2/beforenavigate2.asp
Check what is the type of pDispParams->rgvarg[1]. It should
be VT_DISPATCH. If it's not, then you interpret pDispParams
incorrectly.
Thanks for the tip. I know I should have used IDispatchImpl, rather than
using Invoke directly, but what can I say? For some reason when i tried using
the new attributed ATL, my project won't compile for VS.NET 2005! Hence I
went back to using raw dispatches, anyway I will give it a shot.
Cheers,
T
Nothing useful. For BeforeNavigate2 event, rgvarg[1] corresponds to
Headers parameter and should be either VT_EMPTY, or VT_VARIANT|VT_BYREF
pointing to a variant of type VT_BSTR.
Remember that parameters are packed into rgvarg array in reverse order.
E.g., BeforeNavigate2 event is defined as follows:
void BeforeNavigate2(
IDispatch *pDisp,
VARIANT *url,
VARIANT *Flags,
VARIANT *TargetFrameName,
VARIANT *PostData,
VARIANT *Headers,
VARIANT_BOOL *Cancel
);
When you handle this event, rgvarg[0] corresponds to Cancel parameter,
rgvarg[1] is Headers, ..., rgvarg[6] is pDisp. The last one is the one
you want to obtain IWebBrowser2 of the browser (top-level or frame)
firing the event.
I did notice that after looking into MSDN a bit.
Cheers,
T