Help Please: Can't capture HAR using Python Selenium Script with BrowserMob-Proxy

3,312 views
Skip to first unread message

helpm...@gmail.com

unread,
Sep 26, 2014, 9:47:01 PM9/26/14
to browserm...@googlegroups.com

Hey everyone, I really could use some help with this.  I posted the problem on StackOverflow a couple of days ago, but have yet to receive any responses/assistance.  Pretty much stuck.  For the love and adoration of a complete stranger, I implore you for assistance.  :)

Goal:

I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.

Problem:
I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture a HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided before code snippets).

Software Specs:

  • Operating System: Windows 7 (64x) -- running in VirtualBox
  • Browser: FireFox (32.0.2)
  • Script Language: Python (2.7.8)
  • Automated Web Browser: Selenium (2.43.0) -- installed via PIP
  • BrowserMob-Proxy: 0.6.0 AND 2.0-beta-8 -- see explanation below

BrowserMob-Proxy Explanation:
As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.

Selenium Script (this script works): 
This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.

    """This script utilizes Selenium to obtain the Google homepage"""
    from selenium import webdriver

    driver = webdriver.Firefox()       # Opens FireFox browser.
    driver.get('https://google.com/')  # Gets google.com and loads page in browser.

    driver.quit()                      # Closes Firefox browser

Script ALPHA with BMP (does not work):
This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.

    """Using the same functional Selenium script, produce ALPHA_HAR.har output"""
    from browsermobproxy import Server
    server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
    server.start()
    proxy = server.create_proxy()

    from selenium import webdriver
    driver = webdriver.Firefox()           # Opens FireFox browser.

    proxy.new_har("ALPHA_HAR")             # Creates a new HAR
    driver.get("https://www.google.com/")  # Gets google.com and loads page in browser.
    proxy.har                              # Returns a HAR JSON blob

    server.stop()

Script BETA with BMP (does not work):
This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the below code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).

    """Using the same functional Selenium script, produce BETA_HAR.har output"""
    from browsermobproxy import Server
    server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
    server.start()    
    proxy = server.create_proxy()

    from selenium import webdriver
    profile = webdriver.FirefoxProfile()
    profile.set_proxy(proxy.selenium_proxy())
    driver = webdriver.Firefox(firefox_profile=profile)

    proxy.new_har("BETA_HAR")             # Creates a new HAR
    driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
    proxy.har                             # Returns a HAR JSON blob

    server.stop()

Patrick Lightbody

unread,
Sep 28, 2014, 10:54:24 AM9/28/14
to browserm...@googlegroups.com
Always happy to try to get some random love and adoration from strangers!

Sadly, in this case, I don’t think there is a ton I can say, since I simply don’t know Python at all and I don’t really know how those bindings work (they are contributed by someone else).

What I do know is that the BMP REST API, which is what the Python wrapper talks to, doesn’t write the HAR file anywhere. It simply returns JSON back via HTTP. So unless the Python wrapper is writing the file somewhere, your search for your hard drive will always be in vain. Hint: it doesn’t ;)


Have you tried just printing out the output of the proxy.har call? :)

Patrick




--

---
You received this message because you are subscribed to the Google Groups "BrowserMob Proxy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to browsermob-pro...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

helpm...@gmail.com

unread,
Sep 29, 2014, 8:11:54 PM9/29/14
to browserm...@googlegroups.com
Hey Patrick,

Thanks for the assist (begins funneling love and adoration in your direction).  I experienced a bit of a facepalm moment after reading your text, as I realized I was making the terrible assumption that proxy.har supported that functionality (didn't realize I needed to code it in, especially given all the examples online never mention or include that).  Using your suggestion, I wrote-up a quick python addendum to the earlier ALPHA_HAR test case which ran successfully and produced a HAR output.  It seemed extremely light for data however, even if all it was doing was loading the Google homepage.

The "working" script:
    from selenium import webdriver
    from browsermobproxy import Server

    server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
    server.start() # Initiates BMP
    proxy = server.create_proxy() # Creates BM Proxy

    driver = webdriver.Firefox() # Opens FireFox browser

    driver.get('https://google.com/') # Gets google.com and loads the page.

    server.stop() # Shuts down BMP
    answer = str(proxy.har) # Returns a HAR JSON blob, converts it to a string, and assigns it to variable "answer"

    file = open('CHARLIE_HAR.har', 'w+') # Produces a new file named CHARLIE_HAR.har, which is located in the directory where the script is run from
    file.write(answer) # Writes to CHARLIE_HAR.har, appending the earlier variable "answer"
    file.close() # Saves and closes CHARLIE_HAR.har

The "working" script result:
    {u'log': {u'version': u'1.2', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'pageTimings': {}, u'title': u'', u'id': u'ALPHA_HAR', u'startedDateTime': u'2014-09-29T17:13:29.801+0000'}], u'entries': []}}

As you can see from the above, the output is very light - doesn't seem right.  Just to make sure though, I embellished on the earlier Selenium script, so that it went to google and made a query (which works fine), then added on the BMP logic to collect HAR.  Thus, it would have to produce more data.  The script is below, but unfortunately it errors out.  Below is the code and resulting error traceback message, which is based on proxy.har (BMP logic) - can you help me figure this out?

New Test Script:
    """"Selenium script used in conjunction with BMP to run a query in google, then capture the HAR results in CHARLIE_HAR.har""""

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys

    from browsermobproxy import Server
    server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
    server.start() # Initiates BMP
    proxy = server.create_proxy() # Creates BM Proxy

    driver = webdriver.Firefox() # Opens FireFox browser
    driver.set_window_size(1024, 768) # Sets FF Browser window dimensions

    driver.get('https://google.com/') # Gets google.com and loads the page.

    element = driver.find_element_by_xpath('//*[@id="gbqfq"]') # Locates the xpath associated with the google search box.
    element.send_keys('Thanks for helping me out, Patrick!') # Types "Thanks for helping me out, Patrick!" in Google search box.
    element.send_keys(Keys.ENTER) # presses the "enter" key while in the Google search box, thus inputting the search requests and loading the results page.

    server.stop() # Shuts down BMP
    answer = str(proxy.har) # Returns a HAR JSON blob, converts it to a string, and assigns it to variable "answer"

    file = open('DELTA_HAR.har', 'w+') # Produces a new file named DELTA_HAR.har, which is located in the directory where the script is run from
    file.write(answer) # Writes to DELTA_HAR.har, appending the earlier variable "answer"
    file.close() # Saves and closes DELTA_HAR.har

New Test Script Error Message:
    Traceback (most recent call last):
        File "test3.py", line 21, in <module>
            answer = str(proxy.har) # Returns a HAR JSON blob, converts it to a string, and assigns it to variable "answer"
        File "C:\Python27\lib\site-packages\browsermobproxy\client.py", line 64, in har
            return r.json()
        File "C:\Python27\lib\site-packages\requests\models.py", line 776, in json
            return json.loads(self.text, **kwargs)
        File "C:\Python27\lib\json\__init__.py", line 338, in loads
            return _default_decoder.decode(s)
        File "C:\Python27\lib\json\decoder.py", line 366, in decode
            obj, end = self.raw_decode(s, idx=_w(s, 0).end())
        File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode
            raise ValueError("No JSON object could be decoded")
    ValueError: No JSON object could be decoded


I turned to the web/stackoverflow and read that JSON errors tend to be rather unhelpful and that I should try piping the results into a JSON reader (e.g. http://jsonlint.com/) for a better idea as to the root issue.  To do so, I went in to client.py (leveraged by BMP, the file you linked me to earlier, Patrick) and modified it temporarily to print out the JSON content within the har function.  I was unable to decode to utf-8 format for JSON reading, but a regular print resulted in this extremely short output (see below).

JSON output (taken direct from BMP's client.py [replace "return r.json() with "print r"]):
    <Response [200]>

That is an extremely short response.  The only thing that comes to mind when I think Response 200 is the HTTP code for a successfully loaded page.  The fact that there isn't more information to print leads me to believe that, for some reason, client.py (for BMP) is NOT capturing all of the information.  And so, when converting "<Response [200]>" to JSON it fails out, because it isn't a proper input for conversion (theorizing here).

Any ideas?  If there is truly an issue with the client.py provided with BMP, then I would think others would be reporting this issue.

Thanks,
Matt

Patrick Lightbody

unread,
Sep 30, 2014, 12:03:31 AM9/30/14
to browserm...@googlegroups.com
It doesn't look like your test ever actually configures the browser to use the proxy. That would be why the HAR is empty! I don't know exactly what the Python code would be, but this is called out in the README for Java:

Krishnan Mahadevan

unread,
Sep 30, 2014, 12:51:52 AM9/30/14
to browserm...@googlegroups.com
Matt,

Patrick is correct.

From your working code :

The "working" script:
    from selenium import webdriver
    from browsermobproxy import Server

    server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
    server.start() # Initiates BMP
    proxy = server.create_proxy() # Creates BM Proxy

    driver = webdriver.Firefox() # Opens FireFox browser

    driver.get('https://google.com/') # Gets google.com and loads the page.

    server.stop() # Shuts down BMP
    answer = str(proxy.har) # Returns a HAR JSON blob, converts it to a string, and assigns it to variable "answer"

    file = open('CHARLIE_HAR.har', 'w+') # Produces a new file named CHARLIE_HAR.har, which is located in the directory where the script is run from
    file.write(answer) # Writes to CHARLIE_HAR.har, appending the earlier variable "answer"
    file.close() # Saves and closes CHARLIE_HAR.har

The "working" script result:
    {u'log': {u'version': u'1.2', u'creator': {u'version': u'2.0', u'name': u'BrowserMob Proxy'}, u'pages': [{u'pageTimings': {}, u'title': u'', u'id': u'ALPHA_HAR', u'startedDateTime': u'2014-09-29T17:13:29.801+0000'}], u'entries': []}}


I dont see you binding the proxy server information into the capabilities of the WebDriver object before spawning the firefox browser.

I mean this below section is missing

from selenium import webdriver
    profile = webdriver.FirefoxProfile()
    profile.set_proxy(proxy.selenium_proxy())
    driver = webdriver.Firefox(firefox_profile=profile)










Thanks & Regards
Krishnan Mahadevan

"All the desirable things in life are either illegal, expensive, fattening or in love with someone else!"
My Scribblings @ http://wakened-cognition.blogspot.com/
My Technical Scribbings @ http://rationaleemotions.wordpress.com/

helpm...@gmail.com

unread,
Sep 30, 2014, 12:41:03 PM9/30/14
to browserm...@googlegroups.com
Patrick/Krishnan,

I was thinking that, which is why I offered two scripts in my original post, see the BETA SCRIPT (it incorporates the code you suggested, Krishnan).  However, when used, the browser will open, attempt to acquire the page, but fail.  I am given no error reports to work with to identify the cause.  :/

Patrick Lightbody

unread,
Sep 30, 2014, 2:36:55 PM9/30/14
to browserm...@googlegroups.com
Double check the proxy is being used: put a pause in to your test right after the browser opens and manually check the settings of the proxy.

helpm...@gmail.com

unread,
Sep 30, 2014, 3:25:32 PM9/30/14
to browserm...@googlegroups.com
Alright, this getting to be kind of amusing now.  So, I decided to try and do this from a less locked down network, and here are the results.

Script Alpha:  Before it worked (just didn't capture HAR), now it doesn't run at all.
Script Beta:  Before it didn't even run, now it works (just doesn't capture HAR).

Per your question Patrick, I checked the proxy settings manually, with the beta script, and it changes to use "manual proxy settings", with HTTP/SSL value of "localhost".  The port # increments by 1 each time I run the script, so always a different port.

So, it does appear to be working with BMP now, but can't capture the HAR (error is same as before: "No JSON object could be decoded"), related to the "proxy.har" code snippet.  I went and did the same
The error message is the same as before "No JSON object could be decoded", which is related to the "proxy.har" code snippet.  I did the same client.py manipulation to try and see what data it was getting passed and, it was identical, "Response [200]".

helpm...@gmail.com

unread,
Sep 30, 2014, 4:31:44 PM9/30/14
to browserm...@googlegroups.com
Eureka!  I've got it!  Lots of data being collected, much of it is http syntax I recognize.  Once I've verified it is all good; I'll follow-up with a post on the script and details for others to learn.  Will need to get HARViewer or similar set-up first.

Thank you both for your help; will follow-up soon.  :)
...
Message has been deleted

helpm...@gmail.com

unread,
Sep 30, 2014, 8:28:29 PM9/30/14
to browserm...@googlegroups.com
*100% of love & adoration forwarded to you both*

The below script performs as intended, producing and exporting a HAR, utilizing BrowserMob-Proxy (BMP) with Python implemented Selenium scripting.  For all those out there who stumble upon this with similar issues; hopefully it will you out (for details on the software versions used, see top of my original post).  I've also included a great deal of comments along the way, to assist in reading the code and understanding what it is doing.

HAR Producing Python Script:

    """Selenium script used in conjunction with BMP to run a query in google, then capture the HAR results in CHARLIE_HAR.har"""
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys

    from browsermobproxy import Server
    server = Server('C:\Users\Sand\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy') # The BMP server is initiated by browsermob-proxy.BAT; you will need to modify this line to point to yours

    server.start() # Initiates BMP
    proxy = server.create_proxy() # Creates BM Proxy

    profile = webdriver.FirefoxProfile() # 1/3: Opens Firefox browser and establishes proxy tie between selenium, bmp, and your Firefox browser
    profile.set_proxy(proxy.selenium_proxy()) # 2/3: Opens Firefox browser and establishes proxy tie between selenium, bmp, and your Firefox browser
    driver = webdriver.Firefox(firefox_profile=profile) # 3/3: Opens Firefox browser and establishes proxy tie between selenium, bmp, and your Firefox browser


    driver.set_window_size(1024, 768) # Sets FF Browser window dimensions
    proxy.new_har("Matt_Was_Here") # Creates a new HAR entry entitled Matt_Was_Here within the CHARLIE_HAR.har output
    driver.get('https://www.google.com/') # Gets google.com and loads the page.


    element = driver.find_element_by_xpath('//*[@id="gbqfq"]') # Locates the xpath associated with the google search box.
    element.send_keys('Thank You Patrick For Providing Us All With BMP') # Types "Thank you..." in Google search box.

    element.send_keys(Keys.ENTER) # presses the "enter" key while in the Google search box, thus inputting the search requests and loading the results page.

    server.stop() # Shuts down BMP
    answer = proxy.har # Returns a HAR JSON blob and assigns it to variable "answer"


    file = open('CHARLIE_HAR.har', 'w+') # Produces a new file named CHARLIE_HAR.har, which is located in the directory where the script is run from
    file.write(answer) # Writes to CHARLIE_HAR.har, appending the earlier variable "answer"
    file.close() # Saves and closes CHARLIE_HAR.har



</bl
...
Reply all
Reply to author
Forward
0 new messages