Hi,
I'm new to the list. I want to use MozRepl to access AJAX-generated HTML sources. I want to control the whole process from a Python script (see below). It works fine for small HTML sources but it becomes very slow for pages that are some MBs long. I tried with a page of 5 MB source and it got Firefox frozen for minutes.
What I do is ask the HTML source and copy the output to a string until the prompt re-appears. Below you can find my Python source. The 2nd part ("Death") blocks my browser for minutes.
How to use MozRepl for these cases? It would be nice if I could tell the browser to save the content of a string (HTML source) to a local file. I tried to use FileUtils.openSafeFileOutputStream but I got an exception. If you have a solution, please let me know.
Thanks,
Laszlo
============
#!/usr/bin/env python
import re
from time import sleep
import telnetlib
HOST = 'localhost'
PORT = 4242
prompt = [r'repl\d*> '] # list of regular expressions
def get_page(url, wait=3):
tn = telnetlib.Telnet(HOST, PORT)
tn.expect(prompt)
cmd = "content.location.href = '{url}'".format(url=url)
tn.write(cmd + "\n")
tn.expect(prompt)
if wait:
print '# waiting {X} seconds...'.format(X=wait)
sleep(wait)
print '# continue'
#
tn.write('content.document.body.innerHTML\n')
html = tn.expect(prompt)[2].split('\n')
if html[0].strip() == '"':
html = html[1:]
if re.search(prompt[0], html[-1]):
html = html[:-1]
if html[-1].strip() == '"':
html = html[:-1]
tn.write("repl.quit()\n")
return html
##################################
if __name__ == "__main__":
print 'OK'
html = get_page('
http://simile.mit.edu/crowbar/test.html')
for line in html:
print line
print '================'
print 'Death'
url = '
http://www.ncbi.nlm.nih.gov/nuccore/CP002059.1'
html = get_page(url, wait=30)
for line in html:
print line