how to access content in iframe

东东爸

unread,

Oct 3, 2010, 10:05:29 PM10/3/10

to zotero-dev

Hi all,

I am working on updating CNKI translator, and encounter a problem that how can I access the content in iframe?

The iframe html code shows below:

<iframe width="750" height="342" frameborder="no" scrolling="no" vspace="0" src="brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286155218000" name="iframeResult" marginheight="0" hspace="0" id="iframeResult" style="display: block; height: 1154px;">

</iframe>

I can not access the content in iframe via xpath/evaluate method, and I can not use normal way like window.frames["iframeResult"].document.getElementById either. The error message shows below:

09:35:25 Translation using CNKI failed:

message => window is not defined

fileName => chrome://zotero/content/xpcom/translate.js

lineNumber => 1431

stack => doWeb([object XPCNativeWrapper],"http://acad.cnki.net/Kns55/brief/Result_CJFQ.htm")@chrome://zotero/content/xpcom/translate.js:1431

name => ReferenceError

url => http://acad.cnki.net/Kns55/brief/Result_CJFQ.htm

downloadAssociatedFiles => true

automaticSnapshots => false

Any hints are appreciate.

--
Best Regards！

Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
TAO Cheng
E-mail: aces...@gmail.com ;aces...@nuaa.edu.cn
==================================================

东东爸

unread,

Oct 3, 2010, 10:25:46 PM10/3/10

to zotero-dev

On Mon, Oct 4, 2010 at 10:05 AM, 东东爸 <aces...@gmail.com> wrote:

Hi all,

I am working on updating CNKI translator, and encounter a problem that how can I access the content in iframe?

The iframe html code shows below:

<iframe width="750" height="342" frameborder="no" scrolling="no" vspace="0" src="brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286155218000" name="iframeResult" marginheight="0" hspace="0" id="iframeResult" style="display: block; height: 1154px;">

</iframe>

I can not access the content in iframe via xpath/evaluate method, and I can not use normal way like window.frames["iframeResult"].document.getElementById either. The error message shows below:

09:35:25 Translation using CNKI failed:

   message => window is not defined
   fileName => chrome://zotero/content/xpcom/translate.js
   lineNumber => 1431
   stack => doWeb([object XPCNativeWrapper],"http://acad.cnki.net/Kns55/brief/Result_CJFQ.htm")@chrome://zotero/content/xpcom/translate.js:1431


   name => ReferenceError
   url => http://acad.cnki.net/Kns55/brief/Result_CJFQ.htm
   downloadAssociatedFiles => true
   automaticSnapshots => false

I find contentDocument attribute which seems can hold iframe content as document, it works for me.

东东爸

unread,

Oct 3, 2010, 11:31:02 PM10/3/10

to zotero-dev

On Mon, Oct 4, 2010 at 10:25 AM, 东东爸 <aces...@gmail.com> wrote:

On Mon, Oct 4, 2010 at 10:05 AM, 东东爸 <aces...@gmail.com> wrote:
Hi all,

I am working on updating CNKI translator, and encounter a problem that how can I access the content in iframe?

The iframe html code shows below:

<iframe width="750" height="342" frameborder="no" scrolling="no" vspace="0" src="brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286155218000" name="iframeResult" marginheight="0" hspace="0" id="iframeResult" style="display: block; height: 1154px;">

</iframe>

I can not access the content in iframe via xpath/evaluate method, and I can not use normal way like window.frames["iframeResult"].document.getElementById either. The error message shows below:

09:35:25 Translation using CNKI failed:

   message => window is not defined
   fileName => chrome://zotero/content/xpcom/translate.js
   lineNumber => 1431
   stack => doWeb([object XPCNativeWrapper],"http://acad.cnki.net/Kns55/brief/Result_CJFQ.htm")@chrome://zotero/content/xpcom/translate.js:1431


   name => ReferenceError
   url => http://acad.cnki.net/Kns55/brief/Result_CJFQ.htm
   downloadAssociatedFiles => true
   automaticSnapshots => false

I find contentDocument attribute which seems can hold iframe content as document, it works for me.

Another problem: I run iframe code fine in scaffold(user select item dialog popup), but got error in regular use!

Below is my iframe related code:

...

var xpath = '//iframe[@id="iframeResult"]';

var iframe = doc.evaluate(xpath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext();

var subdoc = iframe.contentDocument;

xpath = '//div[@class="GridTitleDiv"]';

var tds = subdoc.evaluate(xpath, subdoc, nsResolver, XPathResult.ANY_TYPE, null);

...

and the error message:

(4)(+0002092): Translate: Parsing code for CNKI

(4)(+0000004): Translate: Enter multiple~

(2)(+0000002): Translate: Translation using CNKI failed:

message => iframe is null

fileName => chrome://zotero/content/xpcom/translate.js

lineNumber => 1441

stack => doWeb([object XPCNativeWrapper],"http://acad.cnki.net/Kns55/brief/brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286161732053")@chrome://zotero/content/xpcom/translate.js:1441

name => TypeError

url => http://acad.cnki.net/Kns55/brief/brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286161732053

东东爸

unread,

Oct 4, 2010, 12:06:32 AM10/4/10

to zotero-dev

I know what's wrong of the last error! The url in regular use is not the same as in scaffold! So I modify my code as follows:

var xpath = '//iframe[@id="iframeResult"]';

var iframe = doc.evaluate(xpath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext();

xpath = '//div[@class="GridTitleDiv"]';

if (iframe) {

var subdoc = iframe.contentDocument;

tds = subdoc.evaluate(xpath, subdoc, nsResolver, XPathResult.ANY_TYPE, null);

}else{

tds = doc.evaluate(xpath, doc, nsResolver, XPathResult.ANY_TYPE, null);

}

It solves the last problem. But I got another one immediately~

This time it reports similar error message in both scaffold and regular use, so I only post error message in scaffold here:

11:45:20 Translation using CNKI failed:

message => <http://acad.cnki.net> à³ CPÎ <http://apj1.cnki.net> ûÖ^' HTMLDocument.documentElement

fileName => chrome://zotero/content/xpcom/translate.js

lineNumber => 884

stack => getResolver([object XPCNativeWrapper])@chrome://zotero/content/xpcom/translate.js:884

scrapeAndParse1([object XPCNativeWrapper],"http://apj1.cnki.net/kcms/detail/detail.aspx?QueryID=6&CurRec=3&DbCode=CJFQ&dbname=CJFD0608&filename=KONG200705038")@chrome://zotero/content/xpcom/translate.js:902

doWeb([object XPCNativeWrapper],"http://apj1.cnki.net/kcms/detail/detail.aspx?QueryID=6&CurRec=4&DbCode=CJFQ&dbname=CJFD0608&filename=JSJC200701088")@chrome://zotero/content/xpcom/translate.js:1480

name => Error

url => http://acad.cnki.net/Kns55/brief/brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286163240888

downloadAssociatedFiles => true

automaticSnapshots => false

The probable problem may be the incorrect url of selected item. I noticed that the href of search result item looks like this:

/kns55/detail/detail.aspx?QueryID=6&CurRec=1&DbCode=CJFQ&dbname=CJFDTEMP&filename=JYGC201004018

and the url in doWeb looks like this(in regular use):

http://acad.cnki.net/Kns55/brief/brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%E4%B8%AD%E5%9B%BD%E5%AD%A6%E6%9C%AF%E6%9C%9F%E5%88%8A%E7%BD%91%E7%BB%9C%E5%87%BA%E7%89%88%E6%80%BB%E5%BA%93&ConfigFile=CJFQ.xml&research=off&t=1286163240888

and the url of page when I click the link by hand looks like this:

http://apj1.cnki.net/kcms/detail/detail.aspx?QueryID=6&CurRec=1&DbCode=CJFQ&dbname=CJFDTEMP&filename=JYGC201004018&uid=WEEvREdiSUtucElKVWhsVUxGMW14VlU2dld1WmRzVT0=

and the url I modified & transfered to retrieveDocument looks like this(it appears fine if I paste&open it in browser):

http://apj1.cnki.net/kcms/detail/detail.aspx?QueryID=6&CurRec=1&DbCode=CJFQ&dbname=CJFDTEMP&filename=JYGC201004018&

and I am totally confused!

Any suggestion?

362.gif

Frank Bennett

unread,

Oct 4, 2010, 4:14:43 PM10/4/10

to zotero-dev

On Oct 4, 1:06 pm, 东东爸 <acestr...@gmail.com> wrote:

> var xpath = '//iframe[@id="iframeResult"]';
>
> var iframe = doc.evaluate(xpath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext();
>
> xpath = '//div[@class="GridTitleDiv"]';
>
> if (iframe) {
>
> var subdoc = iframe.contentDocument;

I'm not that well versed in the DOM, but I think you can't access an
iframe in the same way as other HTML nodes. It's in the DOM, but it's
located in a separate document, so you have to navigate over there
first. In another (undistributed) translator, I used a construct like
this, which worked for that:

var frameDoc = doc.defaultView.parent.frames[1].document;

You may need some trial and error to find the correct index for your
target frame or iframe.

If the iframe content is dynamic, and you need to obtain it in a
different form, you can get its url, modify it, and use
retrieveDocument() to obtain the other version:

var frameHref = frameDoc.location.href;
var newHref = frameHref.replace( ... );
var otherFrameDocVersion =
Zotero.Utilities.retrieveDocument(newHref);

Does that help?
Frank Bennett

Dan Stillman

unread,

Oct 4, 2010, 5:07:46 PM10/4/10

to zoter...@googlegroups.com

In most cases it shouldn't be necessary to do any of this. Zotero
handles frames for you—all you should usually need is a proper target
regexp to match the frame/iframe URL, and the translator should be
completely oblivious to the parent page.

The only exception to this would be if you need to access content from
both the parent page and the frame, in which case one of Frank's methods
could work. (It might be better to use iframe.getAttribute('src') (where
iframe is in the parent doc) to get the URL, though, since that would
avoid cross-document issues.)

Dan Stillman

unread,

Oct 4, 2010, 5:10:38 PM10/4/10

to zoter...@googlegroups.com

On 10/4/10 5:07 PM, Dan Stillman wrote:
> In most cases it shouldn't be necessary to do any of this. Zotero
> handles frames for you—all you should usually need is a proper target
> regexp to match the frame/iframe URL, and the translator should be
> completely oblivious to the parent page.

And in case that wasn't clear, if you have a page http://example.com/foo
with this:

and load the page in Zotero, you'll see in the debug output that it
binds the translator sandbox to both http://example.com/foo and
http://example.com/bar. So if your translator target matches
http://example.com/bar, the translator will run on the iframe document
rather than the parent document.

东东爸

unread,

Oct 4, 2010, 5:30:32 PM10/4/10

to zoter...@googlegroups.com

On Tue, Oct 5, 2010 at 4:14 AM, Frank Bennett <bierc...@gmail.com> wrote:

I'm not that well versed in the DOM, but I think you can't access an
iframe in the same way as other HTML nodes. It's in the DOM, but it's
located in a separate document, so you have to navigate over there
first. In another (undistributed) translator, I used a construct like
this, which worked for that:

var frameDoc = doc.defaultView.parent.frames[1].document;

You may need some trial and error to find the correct index for your
target frame or iframe.

If the iframe content is dynamic, and you need to obtain it in a
different form, you can get its url, modify it, and use
retrieveDocument() to obtain the other version:

var frameHref = frameDoc.location.href;
var newHref = frameHref.replace( ... );
var otherFrameDocVersion =
Zotero.Utilities.retrieveDocument(newHref);

Thanks Frank!

I tried your method and got frame's href like this:

frameHref=http://acad.cnki.net/Kns55/brief/brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286223902109

As I posted before, I can get each item's relative url via my old way, which looks like this:

/kns55/detail/detail.aspx?QueryID=2&CurRec=20&DbCode=CJFQ&dbname=CJFDTEMP&filename=ZCLW201008028'

Now what should I replace? I tried to replace "/brief/brief.aspx?..." with "/detail/detail.aspx?...", but it did not work for me. The error message shows below:

(2)(+0000315): Translate: Translation using CNKI failed:

message => <http://acad.cnki.net> à³ CPÎ <http://apj1.cnki.net> ûÖ^' HTMLDocument.documentElement

fileName => chrome://zotero/content/xpcom/translate.js

lineNumber => 886

stack => getResolver([object XPCNativeWrapper])@chrome://zotero/content/xpcom/translate.js:886

scrapeAndParse1([object XPCNativeWrapper],"http://acad.cnki.net/kns55/detail/detail.aspx?QueryID=2&CurRec=3&DbCode=CJFQ&dbname=CJFDTEMP&filename=TCZG201009002")@chrome://zotero/content/xpcom/translate.js:908

doWeb([object XPCNativeWrapper],"http://acad.cnki.net/kns55/detail/detail.aspx?QueryID=2&CurRec=4&DbCode=CJFQ&dbname=CJFDTEMP&filename=JRZL201009040")@chrome://zotero/content/xpcom/translate.js:1528

name => Error

url => http://acad.cnki.net/Kns55/brief/brief.aspx?pagename=ASP.brief_result_aspx&dbPrefix=CJFQ&dbCatalog=%e4%b8%ad%e5%9b%bd%e5%ad%a6%e6%9c%af%e6%9c%9f%e5%88%8a%e7%bd%91%e7%bb%9c%e5%87%ba%e7%89%88%e6%80%bb%e5%ba%93&ConfigFile=CJFQ.xml&research=off&t=1286223902109

downloadAssociatedFiles => true

automaticSnapshots => false

And the page url when I directly click the item via mouse is "http://apj1.cnki.net/kcms/detail/detail.aspx?QueryID=2&CurRec=20&DbCode=CJFQ&dbname=CJFDTEMP&filename=ZCLW201008028&uid=WEEvREdiSUtucElKVWhsVUxGMW13Rks0YlFVQTBkND0=". It is similar with my composed url except leading domain string, so I replace the domain string, but it is still the error message!

Does that help?
Frank Bennett

Thanks anyway!

东东爸

unread,

Oct 4, 2010, 5:32:18 PM10/4/10

to zoter...@googlegroups.com

Thanks Dan!

Exactly what happened here! The translator runs on the iframe document.

But what's wrong when I call retrieveDocument using my composed url? What does the error message mean? Did I miss something?

Dan Stillman

unread,

Oct 4, 2010, 5:47:13 PM10/4/10

to zoter...@googlegroups.com

I'm not 100% sure what you're trying to do, but you can't use retrieveDocument() across different domains (and subdomains are different domains). If the iframe URL—the page the translator is actually running on, regardless of the parent frame URL—is on a different domain from the page you're trying to retrieve, you have to use retrieveText() and work with the source you get back.

Frank Bennett

unread,

Oct 4, 2010, 7:15:43 PM10/4/10

to zotero-dev

On Oct 5, 6:10 am, Dan Stillman <dstill...@zotero.org> wrote:

> > In most cases it shouldn't be necessary to do any of this. Zotero
> > handles frames for you—all you should usually need is a proper target
> > regexp to match the frame/iframe URL, and the translator should be
> > completely oblivious to the parent page.

> [...] you'll see in the debug output that it

> binds the translator sandbox to both http://example.com/foo and http://example.com/bar.
> So if your translator target matches http://example.com/bar, the translator will run on
> the iframe document rather than the parent document.

Aha.

东东爸

unread,

Oct 4, 2010, 8:20:59 PM10/4/10

to zoter...@googlegroups.com

On Tue, Oct 5, 2010 at 5:47 AM, Dan Stillman <dsti...@zotero.org> wrote:

I'm not 100% sure what you're trying to do, but you can't use retrieveDocument() across different domains (and subdomains are different domains). If the iframe URL—the page the translator is actually running on, regardless of the parent frame URL—is on a different domain from the page you're trying to retrieve, you have to use retrieveText() and work with the source you get back.

You're right! I did use retrieveDocument() across domains! If I can only use retrieveText(), is that mean I can not use DOM while scraping single article page? If the answer is yes, I need to rewrite my scrape function using page source instead of DOM.

东东爸

unread,

Oct 4, 2010, 10:25:22 PM10/4/10

to zoter...@googlegroups.com

It works for me!

ps, it seems that there is no retrieveText(), I use retrieveSource() instead.

Frank Bennett

unread,

Oct 4, 2010, 10:40:59 PM10/4/10

to zotero-dev

On Oct 5, 9:20 am, 东东爸 <acestr...@gmail.com> wrote:

> You're right! I did use retrieveDocument() across domains! If I can only
> use retrieveText(), is that mean I can not use DOM while scraping single
> article page? If the answer is yes, I need to rewrite my scrape function
> using page source instead of DOM.

Glad that it's clicked. Yes, you'll need to extract details from the
HTML source string. Some tools for doing that are available to
translators, which you can find in xpcom/utilities.js, in the Zotero
source.

Reply all

Reply to author

Forward