Embarrassingly Parallel Programming in Stata

947 views
Skip to first unread message

J

unread,
Nov 24, 2010, 1:34:35 PM11/24/10
to Stata Users Forum
Does anyone know of a quick and easy way to approach embarrassingly
parallel problems in Stata? Embarrassingly parallel problems are
problems that are very easy to parallelize because each of parts is
completely independent of all of the others.

For example, I may estimating production functions for 1000 different
manufacturing plants and the estimation strategy isn't linked across
the plants. I really just have 1000 separate problems. The most
obvious way to solve this in Stata is to set up a foreach loop to
estimate each of the production functions. However if I have 12
processing cores I could farm out an estimation problem to each core
and proceed almost 12 times as fast.

In Matlab there is a very easy way to do this kind of simple
parallelization with a parfor loop. There isn't anything equivalent
built into Stata. Does anyone know of a clever procedure to
parallelize for loops in Stata?

Eric A. Booth

unread,
Nov 25, 2010, 10:35:28 AM11/25/10
to J, Stata Users Forum
Hi J:

You could create 12 copies of Stata on your computer, open the 12 instances of Stata, and run the do-file in each (constrained to the plants you want to run in each so, for Stata Copy 1, in pseudo-code:

*******!
use "mydata.dta", clear
forvalues plant in 1/100 {
<yourfunctions> if plantid == `plant'
...
}
*******!

or you could even just -use- the plants you need for each session, so:

*******!
use if inrange(plantid, 1, 100) using "mydata.dta", clear
<yourfunctions>
...
*******!

and then for Stata Copy 2's do-file, change the range from 1/100 to 101/200, and so on.


The biggest issues with this are that if you want to make one change in the analysis part (viz. the "<yourfunctions> part"), then you have to make it to all 12 do-files. If this is a concern, you could set up a master do-file that "writes" the other 12 do-files using -file write- commands in a loop. It would write the same do-file for each of the 12 sub-do-files, but just change the plantid range in each one.

If you've got a 12-core machine with a multicore license (MP), you would want to constrain each of the 12 stata copies to using only 1 core with this command in each do file: "set processors 1"

Another approach entirely might be to use your OS's command line to send the do-files to the Stata copies. You've mentioned in prev. posts that you use a unix server --which I don't use , I use stata on a Mac -- but there are generally two ways to run Stata from your OS's command prompt:

1) you could tell each copy of Stata to open and run your do file which in Mac OSX Terminal would look something like:

myprompt$ open "/users/.../torun1.do" "/applications/Stata/StataMP-Copy1.app"

2) use commands that Stata understands from the command line:

myprompt$ Stata-MP "/users/.../torun1.do"
(but I'm not sure how you could sent this to the different copies of Stata)

Another approach on a Mac OSX (though I'm not sure the added this to Stata on other platforms) is to use the console version of Stata invoked by typing "Stata-MP" in MacOSX's Terminal. You can open as many terminal windows as you want, invoke all those sessions and run your 12 do-files.

I'm not sure how applicable any of this is for the Unix server version of Stata , but it might spark some other, similar ideas. Let us know what you find.


- Eric
_____
ebo...@ppri.tamu.edu
eric.a...@gmail.com

> --
> You received this message because you are subscribed to the Google Groups "Stata Users Forum" group.
> To post to this group, send email to stata-us...@googlegroups.com.
> To unsubscribe from this group, send email to stata-users-fo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/stata-users-forum?hl=en.
>

Eric A. Booth

unread,
Nov 25, 2010, 10:41:33 AM11/25/10
to Stata Users Forum, J

Also, check this out from Stata's FAQ archive:

http://www.stata.com/support/faqs/unix/batch.html

- Eric
____
ebo...@ppri.tamu.edu
eric.a...@gmail.com

J

unread,
Jan 13, 2011, 9:09:41 PM1/13/11
to Stata Users Forum
Eric,
Thanks for your suggestions. I was hoping there was a more "built
in" way. I have tried some of the strategies you mentioned before.
They work but they just have more overhead.

J

On Nov 25 2010, 10:41 am, "Eric A. Booth" <eric.a.bo...@gmail.com>
wrote:
> Also, check this out fromStata'sFAQ archive:
> eric.a.bo...@gmail.com
>
> On Nov 25, 2010, at 9:35 AM, Eric A. Booth wrote:
>
>
>
>
>
>
>
> > Hi  J:
>
> > You could create 12 copies ofStataon your computer, open the 12 instances ofStata, and run the do-file in each (constrained to the plants you want to run in each so, forStataCopy 1, in pseudo-code:
>
> > *******!
> > use "mydata.dta", clear
> > forvalues plant in 1/100 {
> >    <yourfunctions> if plantid  == `plant'
> >    ...
> > }
> > *******!
>
> > or you could even just -use- the plants you need for each session, so:
>
> > *******!
> > use  if inrange(plantid, 1, 100)  using "mydata.dta", clear
> > <yourfunctions>
> > ...
> > *******!
>
> > and then forStataCopy 2's do-file, change the range from 1/100  to 101/200, and so on.
>
> > The biggest issues with this are that if you want to make one change in the analysis part (viz. the "<yourfunctions> part"), then you have to make it to all 12 do-files.  If this is a concern, you could set up a master do-file that "writes" the other 12 do-files using -file write- commands in a loop.  It would write the same do-file for each of the 12 sub-do-files, but just change the plantid range in each one.
>
> > If you've got a 12-core machine with a multicore license (MP), you would want to constrain each of the 12statacopies to using only 1 core with this command in each do file:  "set processors 1"
>
> > Another approach entirely might be to use your OS's command line to send the do-files to theStatacopies.  You've mentioned in prev. posts that you use a unix server --which I don't use , I usestataon a Mac -- but there are generally two ways to runStatafrom your OS's command prompt:  
>
> > 1)  you could tell each copy ofStatato open and run your do file which in Mac OSX Terminal would look something like:
>
> > myprompt$   open  "/users/.../torun1.do"  "/applications/Stata/StataMP-Copy1.app"
>
> > 2) use commands thatStataunderstands from the command line:
>
> > myprompt$  Stata-MP  "/users/.../torun1.do"  
> > (but I'm not sure how you could sent this to the different copies ofStata)
>
> > Another approach on a Mac OSX (though I'm not sure the added this toStataon other platforms) is to use the console version ofStatainvoked by typing "Stata-MP" in MacOSX's Terminal.  You can open as many terminal windows as you want, invoke all those sessions and run your 12 do-files.
>
> > I'm not sure how applicable any of this is for the Unix server version ofStata, but it might spark some other, similar ideas.  Let us know what you find.
>
> > - Eric
> > _____
> > ebo...@ppri.tamu.edu
> > eric.a.bo...@gmail.com
>
> > On Nov 24, 2010, at 12:34 PM, J wrote:
>
> >> Does anyone know of a quick and easy way to approach embarrassingly
> >> parallel problems inStata?  Embarrassingly parallel problems are
> >> problems that are very easy to parallelize because each of parts is
> >> completely independent of all of the others.
>
> >> For example, I may estimating production functions for 1000 different
> >> manufacturing plants and the estimation strategy isn't linked across
> >> the plants. I really just have 1000 separate problems. The most
> >> obvious way to solve this inStatais to set up a foreach loop to
> >> estimate each of the production functions. However if I have 12
> >> processing cores I could farm out an estimation problem to each core
> >> and proceed almost 12 times as fast.
>
> >> In Matlab there is a very easy way to do this kind of simple
> >> parallelization with a parfor loop. There isn't anything equivalent
> >> built intoStata. Does anyone know of a clever procedure to
> >> parallelize for loops inStata?
>
> >> --
> >> You received this message because you are subscribed to theGoogleGroups"StataUsersForum" group.
> >> To post to this group, send email tostata-u...@googlegroups.com.
> >> To unsubscribe from this group, send email tostata-users-f...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages