selenium + headless display + browsermobproxy -> har file

2,417 views
Skip to first unread message

Simon

unread,
Jan 31, 2013, 1:44:51 PM1/31/13
to browserm...@googlegroups.com
Hello,

I am using this example:

from pyvirtualdisplay import Display
from selenium import webdriver

import sys
sys.path.append("/home/sgalkov/browsermob-proxy-py/")
from browsermobproxy import Server

server = Server("/home/sgalkov/browsermob/browsermobproxy/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()
proxy.new_har("google")

#initialize HIDDEN display
display = Display(visible=0, size=(800, 600))
display.start()

#initialize webdriver
browser = webdriver.Firefox()

browser.get('http://www.google.com')
print browser.title

proxy.har
browser.quit() #important, or else loaded browser will remain running as a bg proc!
display.stop()
server.stop()

==================================

Everything seems to be working and no errors are thrown. The only problem is I don't get a har JSON output. All I get is:

[sgalkov@zpub-web-203 ~]$ python display.py
Google
[sgalkov@zpub-web-203 ~]$

but no har output or file in my directory...

What am I missing?

Simon

unread,
Jan 31, 2013, 2:10:48 PM1/31/13
to browserm...@googlegroups.com
from pyvirtualdisplay import Display
Oops, I wasn't printing the output, here is the updated code:

from selenium import webdriver

import sys
sys.path.append("/home/sgalkov/browsermob-proxy-py/")
from browsermobproxy import Server

server = Server("/home/sgalkov/browsermob/browsermobproxy/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()
proxy.new_har("google")

#initialize HIDDEN display
display = Display(visible=0, size=(800, 600))
display.start()

#initialize webdriver
browser = webdriver.Firefox()

browser.get('http://www.google.com')
print browser.title

print dir(proxy.har)
print proxy.har()
print browser.page_source

browser.quit() #important, or else loaded browser will remain running as a bg proc!
display.stop()
server.stop()

But still, my output isn't giving me a HAR output, all I get from proxy.har() is:

{u'log': {u'version': u'1.1', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'', u'startedDateTime': u'2013-01-31T19:05:23.811+0000', u'id': u'google', u'pageTimings': {}}], u'entries': []}}


full output of the code above is:

[sgalkov@zpub-web-203 ~]$ python display.py
Google
['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__format__', '__func__', '__get__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'im_class', 'im_func', 'im_self']
{u'log': {u'version': u'1.1', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'', u'startedDateTime': u'2013-01-31T19:04:00.128+0000', u'id': u'google', u'pageTimings': {}}], u'entries': []}}
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" itemtype="http://schema.org/WebPage" itemscope="itemscope"><head><meta name="description" content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." /><meta name="robots" content="noodp" /><meta content="/images/google_favicon_128.png" itemprop="image" /><title>Google</title><script>(function(){
window.google={kEI:"IcAKUaHqM6u3iALo34H4Cw",getEI:function(a){for(var b;a&amp;&amp;(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI},https:function(){return"https:"==window.location.protocol},kEXPI:"17259,18168,39523,39978,4000116,4001569,4001959,4001975,4002001,4002159,4002208,4002378,4002436,4002562,4002700,4002734,4002858,4002928,4003035,4003215,4003225,4003318,4003335,4003341,4003371,4003518,4003654,4003687",kCSI:{e:"17259,18168,39523,39978,4000116,4001569,4001959,4001975,4002001,4002159,4002208,4002378,4002436,4002562,4002700,4002734,4002858,4002928,4003035,4003215,4003225,4003318,4003335,4003341,4003371,4003518,4003654,4003687",ei:"IcAKUaHqM6u3iALo34H4Cw"},authuser:0,ml:function(){},pageState:"#",kHL:"en",time:function(){return(new Date).getTime()},log:function(a,
b,c,i){var d=new Image,f=google.lc,e=google.li,g="";d.onerror=d.onload=d.onabort=function(){delete f[e]};f[e]=d;!c&amp;&amp;-1==b.search("&amp;ei=")&amp;&amp;(g="&amp;ei="+google.getEI(i));c=c||"/gen_204?atyp=i&amp;ct="+a+"&amp;cad="+b+g+"&amp;zx="+google.time();a=/^http:/i;a.test(c)&amp;&amp;google.https()?(google.ml(Error("GLMM"),!1,{src:c}),delete f[e]):(d.src=c,google.li=e+1)},lc:[],li:0,j:{en:1,l:function(){google.fl=!0},e:function(){google.fl=!0},b:!!location.hash&amp;&amp;!!location.hash.match("[#&amp;]((q|fp)=|tbs=simg|tbs=sbi)"),bv:21,cf:"",
pm:"p",pl:[],mc:0,sc:0.5,u:"c9c918f0"},Toolbelt:{},y:{},x:function(a,b){google.y[a.id]=[a,b];return!1},load:function(a,b){google.x({id:"l"+a},function(){google.load(a,b)})}};

window.onpopstate=function(){google.j.psc=1};for(var h="ad api bc is p pa ac pc pah ph sa sifp slp spf spn xx zc zz".split(" "),j=0,k;k=h[j++];)(function(a){google.j[a]=function(){google.j.pl.push([a,arguments])}})(k);
.
.
.
.
.
.
GOOGLE'S HTML
.
.
.
.
.
[sgalkov@zpub-web-203 ~]$

But no HAR output...

What else am I missing?

Ryan Schaffer

unread,
Jan 31, 2013, 2:15:44 PM1/31/13
to browserm...@googlegroups.com
Just a quick sanity check, before looking any further try a site other then Google.  Google redirects to https, which can cause issues if you didn't set your browser profile up for ssl.

--
 
---
You received this message because you are subscribed to the Google Groups "BrowserMob Proxy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to browsermob-pro...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
[I pray for] The strength to change what I can, the inability to accept what I can't, and the incapacity to tell the difference.

Simon

unread,
Jan 31, 2013, 2:23:09 PM1/31/13
to browserm...@googlegroups.com
Ok, changed:

proxy.new_har("google")
to
proxy.new_har("w3")

and 

browser.get('http://www.google.com')
to
browser.get('http://www.w3.org')

and commented out 
#print dir(proxy.har)

now the output is:
[sgalkov@zpub-web-203 ~]$ python display.py
World Wide Web Consortium (W3C)
{u'log': {u'version': u'1.1', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'', u'startedDateTime': u'2013-01-31T19:22:35.600+0000', u'id': u'w3', u'pageTimings': {}}], u'entries': []}}
[sgalkov@zpub-web-203 ~]$

Still not the correct HAR output

Simon

unread,
Feb 1, 2013, 4:42:28 PM2/1/13
to browserm...@googlegroups.com
Just to update this port, the current state is:

I am trying to get the following setup going: selenium + headless display + browsermobproxy -> har file

My settings are:
remote CentOS 6.2 box (no display)
python
selenium

my script (display.py) is:

from pyvirtualdisplay import Display
from selenium import webdriver

import sys
sys.path.append("/home/sgalkov/browsermob-proxy-py/")
from browsermobproxy import Server

server = Server("/home/sgalkov/browsermob/browsermobproxy/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

#initialize HIDDEN display
display = Display(visible=0, size=(800, 600))
display.start()
proxy.new_har("w3")


#initialize webdriver
browser = webdriver.Firefox()

browser.get('http://www.w3.org')
print browser.title
print proxy.har()


browser.quit() #important, or else loaded browser will remain running as a bg proc!
display.stop()
server.stop()

however when I run this I am getting:

[sgalkov@zpub-web-203 ~]$ python display.py
World Wide Web Consortium (W3C)
{u'log': {u'version': u'1.1', u'creator': {u'version': u'2.0',
u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'',
u'startedDateTime': u'2013-01-31T19:22:35.600+0000', u'id': u'w3',
u'pageTimings': {}}], u'entries': []}}
[sgalkov@zpub-web-203 ~]$

Seems like the data is coming through but the proxy is maybe not listening to the correct port? What am I missing?

Any help would be greatly appreciated.

Patrick Lightbody

unread,
Feb 1, 2013, 5:33:12 PM2/1/13
to browserm...@googlegroups.com
I don't know Python well, but I bet you aren't launching Firefox in such a way that it is configured to use the proxy. The easiest way to confirm this is to pause the test and go to the proxy settings in the open firefox window and see if anything is set.

I can't help you with Python, but look at the read me for an example of how we do this in Java.

Simon

unread,
Feb 1, 2013, 6:00:29 PM2/1/13
to browserm...@googlegroups.com
The problem here is that this is a headless (no display) setup. I am connecting to a remote node that doesn't have GUI.

I believe I was not binding firefox to the proxy as you said and now the new state is:

[sgalkov@zpub-web-203 browsermobproxy]$ cat bmp.py
from pyvirtualdisplay import Display
from selenium import webdriver

import sys
sys.path.append("/home/sgalkov/browsermob-proxy-py/")

from browsermobproxy import Server
server = Server("/home/sgalkov/browsermob/browsermobproxy/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

from selenium import webdriver

display = Display(visible=0, size=(800, 600))
display.start()

profile  = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)


proxy.new_har("google1")
driver.get("http://www.w3.org/")
print driver.title
print ""
print proxy.har() # returns a HAR JSON blob
print ""
print driver.page_source

server.stop()
driver.quit()
display.stop()


[sgalkov@zpub-web-203 browsermobproxy]$ python bmp.py
/usr/lib/python2.6/site-packages/selenium/webdriver/firefox/firefox_profile.py:219: DeprecationWarning: This method has been deprecated. Please pass in the proxy object to the Driver Object
  DeprecationWarning)


{u'log': {u'browser': {u'version': u'10.0.3', u'name': u'Firefox'}, u'version': u'1.1', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'', u'startedDateTime': u'2013-02-01T22:53:54.917+0000', u'id': u'google1', u'pageTimings': {}}], u'entries': [{u'pageref': u'google1', u'startedDateTime': u'2013-02-01T22:53:54.963+0000', u'cache': {}, u'request': {u'cookies': [], u'url': u'http://www.w3.org/', u'queryString': [], u'headers': [], u'headersSize': 0, u'httpVersion': u'HTTP', u'method': u'GET', u'bodySize': 0}, u'time': 0, u'response': {u'status': -999, u'cookies': [], u'statusText': u'NO RESPONSE', u'content': {u'mimeType': u'', u'size': 0}, u'headers': [], u'headersSize': 0, u'redirectURL': u'', u'bodySize': 0, u'httpVersion': u'HTTP'}}, {u'pageref': u'google1', u'startedDateTime': u'2013-02-01T22:53:54.989+0000', u'cache': {}, u'request': {u'cookies': [], u'url': u'http://www.w3.org/favicon.ico', u'queryString': [], u'headers': [], u'headersSize': 0, u'httpVersion': u'HTTP', u'method': u'GET', u'bodySize': 0}, u'time': 0, u'response': {u'status': -999, u'cookies': [], u'statusText': u'NO RESPONSE', u'content': {u'mimeType': u'', u'size': 0}, u'headers': [], u'headersSize': 0, u'redirectURL': u'', u'bodySize': 0, u'httpVersion': u'HTTP'}}, {u'pageref': u'google1', u'startedDateTime': u'2013-02-01T22:53:54.998+0000', u'cache': {}, u'request': {u'cookies': [], u'url': u'http://www.w3.org/favicon.ico', u'queryString': [], u'headers': [], u'headersSize': 0, u'httpVersion': u'HTTP', u'method': u'GET', u'bodySize': 0}, u'time': 0, u'response': {u'status': -999, u'cookies': [], u'statusText': u'NO RESPONSE', u'content': {u'mimeType': u'', u'size': 0}, u'headers': [], u'headersSize': 0, u'redirectURL': u'', u'bodySize': 0, u'httpVersion': u'HTTP'}}]}}

<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre></pre></body></html>
[sgalkov@zpub-web-203 browsermobproxy]$

So now I am seeing a little more in the HAR output but it is still empty and I am not seeing the title being printed and pretty much no content..

On Thursday, January 31, 2013 10:44:51 AM UTC-8, Simon wrote:

Patrick Lightbody

unread,
Feb 2, 2013, 9:19:12 AM2/2/13
to browserm...@googlegroups.com
The title won't be captured - that's not a feature that BMP supports. What you can do is get the HAR file back and then execute JavaScript via Selenium to get the title and then merge it in to the HAR.

As for the content, you have to call tell the proxy to capture content and headers if you want those. I don't know the Python bindings well, but I believe there are REST endpoints for those commands. If not, we can add it and then whoever maintains the Python bindings will need to add the commands to the wrapper.

Patrick

Simon

unread,
Feb 2, 2013, 6:54:24 PM2/2/13
to browserm...@googlegroups.com
Ok, I just want a way to save the .har output but that is not working with the setup I have. As you can see, the HAR object returned is empty. Would you have any suggestions on how to fix that? 

Thanks.

Patrick Lightbody

unread,
Feb 2, 2013, 11:00:11 PM2/2/13
to browserm...@googlegroups.com
It's seeing the initial request to www.w3c.org but it's showing no valid response coming back. What do the BrowserMob Proxy log files show?

Patrick

On Feb 1, 2013, at 3:00 PM, Simon <ken...@gmail.com> wrote:

Simon

unread,
Feb 3, 2013, 11:28:09 PM2/3/13
to browserm...@googlegroups.com
Hello Patrick,

Where would I find the BMP log files? I cannot find them. If I run:
[sgalkov@zpub-web-203 bin]$ sh browsermob-proxy
INFO 02/04 04:19:01 o.b.p.Main           - Starting BrowserMob Proxy version 2.0-beta-7
INFO 02/04 04:19:01 o.e.j.u.log          - jetty-7.3.0.v20110203
INFO 02/04 04:19:01 o.e.j.u.log          - started o.e.j.s.ServletContextHandler{/,null}
INFO 02/04 04:19:02 o.e.j.u.log          - Started SelectChann...@0.0.0.0:8080


And then run my script, I see:
INFO 02/04 04:20:02 o.b.p.j.h.HttpServer - Version Jetty/5.1.x
INFO 02/04 04:20:02 o.b.p.j.u.Container  - Started HttpContext[/,/]
INFO 02/04 04:20:02 o.b.p.j.h.SocketLis~ - Started SocketListener on 0.0.0.0:9093
INFO 02/04 04:20:02 o.b.p.j.u.Container  - Started org.browsermob.proxy.jetty.jetty.Server@460ab1b4

and my script displays an empty json blob.

I've tried the following two scripts:
display.py:
from pyvirtualdisplay import Display
from selenium import webdriver

import sys
sys.path.append("/home/sgalkov/browsermob-proxy-py/")
from browsermobproxy import Server

server = Server("/home/sgalkov/browsermob/browsermobproxy/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

#initialize HIDDEN display
display = Display(visible=0, size=(800, 600))
display.start()

proxy.new_har("w3")

#initialize webdriver
browser = webdriver.Firefox()

browser.get('http://www.w3.org')
print browser.title

#print dir(proxy.har)
print proxy.har()
print proxy.webdriver_proxy()
print server.port
print server.url
#print browser.page_source

browser.quit() #important, or else loaded browser will remain running as a bg proc!
display.stop()
server.stop()

which returns:
[sgalkov@zpub-web-203 ~]$ python display.py
World Wide Web Consortium (W3C)
{u'log': {u'version': u'1.1', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'', u'startedDateTime': u'2013-02-04T04:22:29.455+0000', u'id': u'w3', u'pageTimings': {}}], u'entries': []}}
<selenium.webdriver.common.proxy.Proxy object at 0x2569890>
8080
[sgalkov@zpub-web-203 ~]$

and  bmp.py:
import sys
sys.path.append("/home/sgalkov/browsermob-proxy-py/")
sys.path.append("/home/sgalkov/selenium-2.29.0/py/selenium")

from pyvirtualdisplay import Display
from selenium import webdriver
from browsermobproxy import Server
server = Server("/home/sgalkov/browsermob/browsermobproxy/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

from selenium import webdriver

display = Display(visible=0, size=(800, 600))
display.start()

proxy.new_har("google1")

#profile  = webdriver.FirefoxProfile()
#profile.set_proxy(proxy.selenium_proxy())
#driver = webdriver.Firefox(firefox_profile=profile)

driver = webdriver.Firefox()

driver.get("http://www.w3.org/")
print driver.title
print ""
print proxy.har() # returns a HAR JSON blob
print ""
#print driver.page_source

server.stop()
driver.quit()
display.stop()


which now returns:
[sgalkov@zpub-web-203 ~]$ python bmp.py
World Wide Web Consortium (W3C)

{u'log': {u'version': u'1.1', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'title': u'', u'startedDateTime': u'2013-02-04T04:25:16.593+0000', u'id': u'google1', u'pageTimings': {}}], u'entries': []}}

[sgalkov@zpub-web-203 ~]$

Is there an easy way to test bmp without this script? I've tried curl'ing but am still getting empty output.

Simon

unread,
Feb 6, 2013, 1:27:35 AM2/6/13
to browserm...@googlegroups.com
Do you have any more recommendations please.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages