Efficiency of multiple hosts for monte carlo testing

Scott Mermelstein

unread,

Jun 20, 2016, 1:17:48 PM6/20/16

to scoop-users

Hello!

I was testing the latest version of scoop from the git source, specifically the calc.py monte carlo example. It ran very quickly on my 8-core machine, so I decided to test it harder, and multiplied both arguments of the calcPi function by 10, e.g.

dataPi = calcPi(30000, 50000)

. Now this took longer (not surprisingly, it was close to 100 times longer - the initial time was about 1.52 seconds while the extended time was about 115 seconds). Having a long enough run to let me test how well it worked on multiple devices, I created a hostfile listing my machine and 4 others, each with 8 cpus. I had already followed the instructions to make ssh work easily, and all machines share a commonly mounted drive. I could check on each host during the run and see that there were processes executing. So the result surprised me - it still took 115 seconds!

Did I do something wrong? Considering that making each parameter ten times bigger resulted in a roughly 100 times the calculation time, I was assuming that having 5 times the processing power would make it take a fifth of the time, so somewhere in the range of 20-30 seconds.

Here is a copy of relevant input/output (I've renamed all the hosts and obfuscated some IPs just to keep my IT guys happy about security). As mentioned above, this is using examples/pi_calc.py, with the main line changed to dataPi = calcPi(30000, 50000). In case it changes drastically over time, here's a copy of it:

#
#    This file is part of Scalable COncurrent Operations in Python (SCOOP).
#
#    SCOOP is free software: you can redistribute it and/or modify
#    it under the terms of the GNU Lesser General Public License as
#    published by the Free Software Foundation, either version 3 of
#    the License, or (at your option) any later version.
#
#    SCOOP is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
#    GNU Lesser General Public License for more details.
#
#    You should have received a copy of the GNU Lesser General Public
#    License along with SCOOP. If not, see <http://www.gnu.org/licenses/>.
#
"""
Calculation of Pi using a Monte Carlo method.
"""


from math import hypot
from random import random
from scoop import futures
from time import time


# A range is used in this function for python3. If you are using python2, a
# xrange might be more efficient.
def test(tries):
    return sum(hypot(random(), random()) < 1 for _ in range(tries))




# Calculates pi with a Monte-Carlo method. This function calls the function
# test "n" times with an argument of "t". Scoop dispatches these 
# functions interactively accross the available ressources.
def calcPi(workers, tries):
    bt = time()
    expr = futures.map(test, [tries] * workers)
    piValue = 4. * sum(expr) / float(workers * tries)
    totalTime = time() - bt
    print("pi = " + str(piValue))
    print("total time: " + str(totalTime))
    return piValue


if __name__ == "__main__":
    dataPi = calcPi(30000, 50000)

I'm using PYTHONPATH since this is the latest git install instead of the stable one, and I needed to make sure that python finds the right module.

[smermelstein@host1 examples]$ python -m scoop --pythonpath $PYTHONPATH pi_calc.py

[2016-06-20 11:57:42,599] launcher INFO SCOOP 0.7 2.0 on linux using Python 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Dec 7 2015, 11:16:01) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)], API: 1013

[2016-06-20 11:57:42,599] launcher INFO Deploying 8 worker(s) over 1 host(s).

[2016-06-20 11:57:42,599] launcher INFO Worker d--istribution:

[2016-06-20 11:57:42,599] launcher INFO 127.0.0.1: 7 + origin

Launching 8 worker(s) using /bin/bash.

pi = 3.141626184

total time: 115.43567848205566

[2016-06-20 11:59:38,905] launcher (127.0.0.1:38437) INFO Root process is done.

[2016-06-20 11:59:38,905] launcher (127.0.0.1:38437) INFO Finished cleaning spawned subprocesses.

[smermelstein@host1 examples]$ python -m scoop --hostfile hostfile --pythonpath $PYTHONPATH pi_calc.py

[2016-06-20 11:50:56,504] launcher INFO SCOOP 0.7 2.0 on linux using Python 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Dec 7 2015, 11:16:01) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)], API: 1013

[2016-06-20 11:50:56,504] launcher INFO Deploying 40 worker(s) over 5 host(s).

[2016-06-20 11:50:56,504] launcher INFO Worker d--istribution:

[2016-06-20 11:50:56,504] launcher INFO host1: 7 + origin

[2016-06-20 11:50:56,505] launcher INFO host2: 8

[2016-06-20 11:50:56,505] launcher INFO host3: 8

[2016-06-20 11:50:56,505] launcher INFO host4: 8

[2016-06-20 11:50:56,505] launcher INFO host5: 8

Warning: Permanently added 'host2,10.0.xx.aa' (ECDSA) to the list of known hosts.

Warning: Permanently added 'host4,10.0.xx.bb' (ECDSA) to the list of known hosts.

Warning: Permanently added 'host5,10.0.xx.cc' (ECDSA) to the list of known hosts.

Launching 8 worker(s) using /bin/bash.

Authorized Users Only!

By accessing this system you are consenting to complete

monitoring with no expectation of privacy. Unauthorized access or use may

subject you to disciplinary action and criminal prosecution.

Authorized Users Only!

By accessing this system you are consenting to complete

monitoring with no expectation of privacy. Unauthorized access or use may

subject you to disciplinary action and criminal prosecution.

Authorized Users Only!

By accessing this system you are consenting to complete

monitoring with no expectation of privacy. Unauthorized access or use may

subject you to disciplinary action and criminal prosecution.

Warning: Permanently added 'host3,10.0.xx.dd' (ECDSA) to the list of known hosts.

Authorized Users Only!

By accessing this system you are consenting to complete

monitoring with no expectation of privacy. Unauthorized access or use may

subject you to disciplinary action and criminal prosecution.

Launching 8 worker(s) using /bin/bash.

pi = 3.1415989786666665

total time: 117.70856070518494

[2016-06-20 11:52:55,385] launcher (127.0.0.1:37657) INFO Root process is done.

Killed by signal 15.

[2016-06-20 11:52:55,385] launcher (127.0.0.1:37657) INFO Finished cleaning spawned subprocesses.

Killed by signal 15.

As you can probably infer from the above, this is what the hostfile (again, with the names changed) looks like:

host1 8

host2 8

host3 8

host4 8

host5 8

I must be doing something wrong. I'd love to use this module, but it only helps me if running on multiple hosts reduces the execution time. Thanks for any help you can give!

Nick Vandewiele

unread,

Jun 20, 2016, 4:40:04 PM6/20/16

to scoop-users

Hi

One experiment you could run is to check whether the number of futures (ic: N=30000) has an influence on your timings, by bundling the work to be done into larger chunks: calcPi(300, 5000000)

My experience in the past is that creating too many futures will be detrimental to performance.

regards,

Nick

Scott Mermelstein

unread,

Jun 20, 2016, 5:19:53 PM6/20/16

to scoop-users

Hi Nick,

Thanks for your suggestion.

I just tested with your suggestion. My results were:

pi = 3.1416689146666665

total time: 113.53602170944214

While there is a 2 second difference, I'm guessing that's just random variance. I'm eager for any further suggestions you may have, though.

For what it's worth, I expect my actual application will use no more than 100 processors. I hadn't realized the number of futures would matter so drastically, so I tried testing it that way, (i.e. calcPi(100, 15000000)), and got virtually the same results:

pi = 3.141538629333333

total time: 112.91496348381042

I'm totally new to this. Is there some flag I'm not passing on the command line that I should be?

Thanks!

Nick Vandewiele

unread,

Jun 20, 2016, 8:41:04 PM6/20/16

to scoop-users

I don't see what you might have done wrong, so far.

Another experiment you could do is to test the scalability on a single machine with a high enough workload and see if that scales as expected.

Perhaps use the -n $WORKERS flag in the expression (python -m scoop -n $WORKERS) and vary it from 1 to 8.

regards,

Nick

On Monday, June 20, 2016 at 1:17:48 PM UTC-4, Scott Mermelstein wrote:

Scott Mermelstein

unread,

Jun 21, 2016, 1:56:56 PM6/21/16

to scoop-users

Well, I think you solved the issue I was seeing, somewhat. The problem doesn't seem to be with working on multiple hosts, but simply seems to be an asymptote for maximum number of futures that is useful. I guess it makes sense to have an asymptote at some point, but I'm very surprised that it's so early. Testing on a single machine, I got these results:

python -m scoop -n 1 pi_calc.py total time: 637.7268812656403 (not quite 8 x 115, so it's not strictly linear)

python -m scoop -n 2 pi_calc.py total time: 215.1882631778717

python -m scoop -n 3 pi_calc.py total time: 147.0616569519043

python -m scoop -n 4 pi_calc.py total time: 118.42995691299438

python -m scoop -n 5 pi_calc.py total time: 117.1646957397461

python -m scoop -n 6 pi_calc.py total time: 116.73066234588623

python -m scoop -n 7 pi_calc.py total time: 115.9863851070404

Clearly, once we hit n = 4, we don't really get any gain from going any higher. This doesn't make sense to me; I would expect there to be time improvements for several hundreds or thousands of futures. But it makes it likely that the issue isn't from using multiple hosts.

Some basic tests across multiple hosts:

python -m scoop --hostfile hostfile --pythonpath $PYTHONPATH -n 3 pi_calc.py total time: 215.9782886505127 (hostfile = host1 2, host2 2)

python -m scoop --hostfile hostfile --pythonpath $PYTHONPATH -n 4 pi_calc.py total time: 152.8207883834839 (hostfile = host1 3, host2 3)

python -m scoop --hostfile hostfile --pythonpath $PYTHONPATH -n 5 pi_calc.py total time: 150.28630304336548 (hostfile = host1 3, host2 3)

python -m scoop --hostfile hostfile --pythonpath $PYTHONPATH -n 6 pi_calc.py total time: 147.80715227127075 (hostfile = host1 3, host2 3)

python -m scoop --hostfile hostfile --pythonpath $PYTHONPATH -n 7 pi_calc.py total time: 119.10033822059631 (hostfile = host1 4, host2 4)

Some of the numbers may have gotten thrown off; quite a few of the runs came up with "ERROR Could not unpickle status update message", and I'm guessing that made the scheduler reschedule some jobs, making it take slightly longer to hit the asymptote.

I guess my question is done now - you've got me to see that the issue isn't directly related to multiple hosts. While I still find it strange that we hit the limit so quickly, I can accept that for now - at least enough to try using scoop in my real code, and seeing where the limit in that is.

Thanks again for your help!

Reply all

Reply to author

Forward