"crowd sourced" benchmarking

39 views
Skip to first unread message

Jeremy Gray

unread,
Oct 26, 2012, 11:35:10 PM10/26/12
to psycho...@googlegroups.com
Hi all,

I have the basic pieces in place for "crowd sourced" benchmarking.
This is in my branch feature-benchmark on github. This was inspired in
part by Jonas Lindelove's very interesting discussion last February of
hardware profiling.

The idea is that anyone anywhere can run the Benchmark Wizard from
within Builder Tools menu. It will go through a series of steps and
optionally upload a small anonymous file to psychopy.org. File upload
will automatically trigger the update a web-page that is viewable by
anyone:
http://upload.psychopy.org/benchmark/report.html
The web page displays only a small fraction of the data uploaded; the
rest is archived and could be used in other analyses going forward. By
tweaking a few display parameters, its easy to drill down more into
the data: does having internet access help or hurt DotStim
performance? does this vary by platform?. Eventually, it will be
possible to display multiple items at once (or do multiple regression,
e.g., to look at DotStim performance as a function of GL max vertices,
controlling for CPU speed).

Currently, what is displayed is one data item (DotStim max dots with
no dropped frames) as a function of OS version, OpenGL version, and
whether there was internet access at the time of the test. As other
people run the wizard, various other cells will be populated and
automatically fill in. Data within a cell is averaged, and the mean is
shown. If you hover over the mean, some descriptive stats are shown
(just basic stuff). I ran 5 times with internet access during the test
and 5 times without it.

You can clone my feature branch, and should be able to run the
benchmarks and upload data. Comments questions etc welcome on any of
this.

--Jeremy

Jonas Lindeløv

unread,
Nov 4, 2012, 4:13:18 PM11/4/12
to psycho...@googlegroups.com
This is excellent, Jeremy! I'm very happy about the report design - the color marking of performance problems and the "notes" column. I just ran a few tests from a very low performance computer (an Asus EEE Box). I made a pretty solid bottom scraper :-)

A few suggestions for further work on this feature:
  1. Option to choose between "performance benchmark" (your current implementation) and "full benchmark" which runs all psychopy tests in addition? A very rough first implementation of the latter would simply plot the output from run.py in a single cell called "results from thorough test". But it might be possible to pick out problems more specifically like: "incorrect rendering using visual.TextStim in visual.Window in TestPygletNormNoShaders mode".
  2. Option to generate two report types: "full report" (your current approach) and "error report" (only results which needs attention: fails and outside-desired-range)

Best,
Jonas

Jeremy Gray

unread,
Nov 4, 2012, 8:31:37 PM11/4/12
to psycho...@googlegroups.com
Hi Jonas,

Thanks for the encouraging words, and even more for testing it out!

From this, we now have people testing on mac, windows, and linux, which is good to know.

1. Option to choose between "performance benchmark" (your current implementation) and "full benchmark" which runs all psychopy tests in addition? A very rough first implementation of the latter would simply plot the output from run.py in a single cell called "results from thorough test". But it might be possible to pick out problems more specifically like: "incorrect rendering using visual.TextStim in visual.Window in TestPygletNormNoShaders mode"

I see these as being conceptually rather different, although they do indeed have some overlap. I see the performance benchmarks as being a test of the hardware, whereas the psychopy tests are less about performance and more about the integrity of the psychopy code base. So for now I think they are best kept separate.

2. Option to generate two report types: "full report" (your current approach) and "error report" (only results which needs attention: fails and outside-desired-range)

interesting suggestion, I'll have to think about this. there maybe be things we can do with the html part, like have a button to hide or show different parts of the report. so it could show only critical things by default but with a single click you can reveal all the configuration info.

thanks for giving it a whirl!

--Jeremy

Jonas Lindeløv

unread,
Nov 5, 2012, 9:20:15 AM11/5/12
to psycho...@googlegroups.com
I just submitted results from another Ubuntu computer :-) I like the fact that only one of six benchmarks is from a windows computer. 

See comments below.


On Monday, November 5, 2012 2:31:37 AM UTC+1, Jeremy Gray wrote:
Hi Jonas,

Thanks for the encouraging words, and even more for testing it out!

From this, we now have people testing on mac, windows, and linux, which is good to know.

1. Option to choose between "performance benchmark" (your current implementation) and "full benchmark" which runs all psychopy tests in addition? A very rough first implementation of the latter would simply plot the output from run.py in a single cell called "results from thorough test". But it might be possible to pick out problems more specifically like: "incorrect rendering using visual.TextStim in visual.Window in TestPygletNormNoShaders mode"

I see these as being conceptually rather different, although they do indeed have some overlap. I see the performance benchmarks as being a test of the hardware, whereas the psychopy tests are less about performance and more about the integrity of the psychopy code base. So for now I think they are best kept separate.

Testing performance and tests together would answer the question: "Does psychopy work as expected on this system?". My suggestion actually stems from a few use cases: I coded a few experiments for my students. It turned out that they ran them from their laptops (never a good idea, I know...) and neither frame-syncing nor opacity was handled correctly. Running a full test would've made it obvious whether these laptops were adequate. I guess this is a common use case as you would want to run such a test on all systems where you collect data.

This is just on a "nice to have" list without being on the essentials-list :-)
 

2. Option to generate two report types: "full report" (your current approach) and "error report" (only results which needs attention: fails and outside-desired-range)

interesting suggestion, I'll have to think about this. there maybe be things we can do with the html part, like have a button to hide or show different parts of the report. so it could show only critical things by default but with a single click you can reveal all the configuration info.

Oh yes, that's a nice idea. You could hide table rows by id like this (not tested):

<button onClick="document.getElementById("ok").style.display = 'none';">Only show errors/warnings</button>
<button onClick="document.getElementById("ok").style.display = '';">Show all information</button>

<table>
<tr id="error"><td>Result 1</td><td>I'm an error!</td></tr>
<tr id="ok"><td>Result 2</td><td>I'm a succes!</td></tr>
</table>

Jeremy Gray

unread,
Nov 5, 2012, 10:47:15 AM11/5/12
to psycho...@googlegroups.com

I just submitted results from another Ubuntu computer :-) I like the fact that only one of six benchmarks is from a windows computer. 

enjoy it while it lasts :-)
 
1. Option to choose between "performance benchmark" (your current implementation) and "full benchmark" which runs all psychopy tests in addition? A very rough first implementation of the latter would simply plot the output from run.py in a single cell called "results from thorough test". But it might be possible to pick out problems more specifically like: "incorrect rendering using visual.TextStim in visual.Window in TestPygletNormNoShaders mode"

I see these as being conceptually rather different, although they do indeed have some overlap. I see the performance benchmarks as being a test of the hardware, whereas the psychopy tests are less about performance and more about the integrity of the psychopy code base. So for now I think they are best kept separate.

Testing performance and tests together would answer the question: "Does psychopy work as expected on this system?". My suggestion actually stems from a few use cases: I coded a few experiments for my students. It turned out that they ran them from their laptops (never a good idea, I know...) and neither frame-syncing nor opacity was handled correctly. Running a full test would've made it obvious whether these laptops were adequate. I guess this is a common use case as you would want to run such a test on all systems where you collect data.

I think its a great idea to try to leverage existing tests, and use them for benchmarking. I had not appreciated this completely. I'll keep this in mind. Some things like frame-syncing are tested already, and some are not (like opacity), and some probably should not be tested as part of benchmarking (like testApp tests). 
 
This is just on a "nice to have" list without being on the essentials-list :-)

I agree
 
2. Option to generate two report types: "full report" (your current approach) and "error report" (only results which needs attention: fails and outside-desired-range)

interesting suggestion, I'll have to think about this. there maybe be things we can do with the html part, like have a button to hide or show different parts of the report. so it could show only critical things by default but with a single click you can reveal all the configuration info.

Oh yes, that's a nice idea. You could hide table rows by id like this (not tested):

<button onClick="document.getElementById("ok").style.display = 'none';">Only show errors/warnings</button>
<button onClick="document.getElementById("ok").style.display = '';">Show all information</button>

<table>
<tr id="error"><td>Result 1</td><td>I'm an error!</td></tr>
<tr id="ok"><td>Result 2</td><td>I'm a succes!</td></tr>
</table>

interesting! however it seems like a multiline table will require more something more, including unique id's per row, and a way to hide them all. 

--Jeremy

Jonas Lindeløv

unread,
Nov 5, 2012, 11:25:27 AM11/5/12
to psycho...@googlegroups.com
On Monday, November 5, 2012 4:47:15 PM UTC+1, Jeremy Gray wrote:

I just submitted results from another Ubuntu computer :-) I like the fact that only one of six benchmarks is from a windows computer. 

enjoy it while it lasts :-)
 
1. Option to choose between "performance benchmark" (your current implementation) and "full benchmark" which runs all psychopy tests in addition? A very rough first implementation of the latter would simply plot the output from run.py in a single cell called "results from thorough test". But it might be possible to pick out problems more specifically like: "incorrect rendering using visual.TextStim in visual.Window in TestPygletNormNoShaders mode"

I see these as being conceptually rather different, although they do indeed have some overlap. I see the performance benchmarks as being a test of the hardware, whereas the psychopy tests are less about performance and more about the integrity of the psychopy code base. So for now I think they are best kept separate.

Testing performance and tests together would answer the question: "Does psychopy work as expected on this system?". My suggestion actually stems from a few use cases: I coded a few experiments for my students. It turned out that they ran them from their laptops (never a good idea, I know...) and neither frame-syncing nor opacity was handled correctly. Running a full test would've made it obvious whether these laptops were adequate. I guess this is a common use case as you would want to run such a test on all systems where you collect data.

I think its a great idea to try to leverage existing tests, and use them for benchmarking. I had not appreciated this completely. I'll keep this in mind. Some things like frame-syncing are tested already, and some are not (like opacity), and some probably should not be tested as part of benchmarking (like testApp tests). 

Just to make sure that you understood my suggestion: I was thinking about running the test suite as it is and show the output as it is, together with the "real" benchmarking that you already made in firstRun.py. So firstRun.py could just call psychopy/tests/run.py and display the output. 

But running performance benchmarks on all kinds of stimuli (that's what your suggesting, right?) is actually a quite interesting idea, although it's extensive. But impressive as it might be, that is even less on the essentials-list :-)
 
 
This is just on a "nice to have" list without being on the essentials-list :-)

I agree
 
2. Option to generate two report types: "full report" (your current approach) and "error report" (only results which needs attention: fails and outside-desired-range)

interesting suggestion, I'll have to think about this. there maybe be things we can do with the html part, like have a button to hide or show different parts of the report. so it could show only critical things by default but with a single click you can reveal all the configuration info.

Oh yes, that's a nice idea. You could hide table rows by id like this (not tested):

<button onClick="document.getElementById("ok").style.display = 'none';">Only show errors/warnings</button>
<button onClick="document.getElementById("ok").style.display = '';">Show all information</button>

<table>
<tr id="error"><td>Result 1</td><td>I'm an error!</td></tr>
<tr id="ok"><td>Result 2</td><td>I'm a succes!</td></tr>
</table>

interesting! however it seems like a multiline table will require more something more, including unique id's per row, and a way to hide them all. 


Javascript: Oh, I was a bit too quick on the javascript there. Google got me something like this (still not tested):

<script type="text/javascript">

// Loops through all rows in document and changes display property of rows with a specific ID
// toggle('ok', '') will display all rows
// toggle('ok', 'none') hides ok rows.
function toggle(ID, display_value) {
   tr=document.getElementsByTagName('tr')
   for (i=0;i<tr.length;i++) {
       if (tr[i].id == ID) tr[i].style.display == display_value;
   }
}
</script>
 
--Jeremy

Jeremy Gray

unread,
Nov 5, 2012, 1:47:47 PM11/5/12
to psycho...@googlegroups.com
nice! I've added your javascript, and its in my latest commit.
Reply all
Reply to author
Forward
0 new messages