How to explain the time increase for Metropolis Hastings when increasing the number of chains

7 views
Skip to first unread message

larryh...@gmail.com

unread,
Aug 27, 2018, 5:06:24 PM8/27/18
to QUESO-users mailing list
Hi,

I've been using QUESO in my research for a while. Thanks for developing such a useful tool!
Recently I noticed that when increasing the number of MCMC chains, the total time cost increases. The chain length is unchanged, everything is the same for each chain and the computing resource is always sufficient, each process is handled by one processor.

WeChat Image_20180827160333.png


The running time is measured by QUESO automatically, using 'ip.solveWithBayesMetropolisHastings' function. In my understanding, these chains are independent and the MPI processes are executing the same code. The only thing I can come up with is that writing data to file sequentially. Is the data writing time also counted into the running time? If yes, from 1 chain to 10 chains, the chain number is increased by 9, the time is increased by 246 seconds; from 10 chains to 100 chains, the chain number is increased by 90 and the time is increased by 207 seconds. Even if we consider the effect of the unstable performance of the CPUs, it's still difficult to understand. From 100 chains to 200 chains, it's not even increasing. 
 
Is my idea correct? Do you have other explanation?

Thanks,
Han

Damon McDougall

unread,
Sep 1, 2018, 2:33:24 PM9/1/18
to queso...@googlegroups.com
Hey Han,

Great sleuthing! And thanks for the kind words.

I suspect you're probably right, but output from a profiler will be more insightful. Have you used a profiler before? If not, I'm more than happy to dig deeper.

Best wishes,
Damon

On Mon, 27 Aug 2018, at 16:06, larryh...@gmail.com wrote:
> Hi,
>
> I've been using QUESO in my research for a while. Thanks for developing
> such a useful tool!
> Recently I noticed that when increasing the number of MCMC chains, the
> total time cost increases. The chain length is unchanged, everything is the
> same for each chain and the computing resource is always sufficient, each
> process is handled by one processor.
>
> [image: WeChat Image_20180827160333.png] <about:invalid#zClosurez>
>
> The running time is measured by QUESO automatically, using
> 'ip.solveWithBayesMetropolisHastings' function. In my understanding, these
> chains are independent and the MPI processes are executing the same code.
> The only thing I can come up with is that writing data to file
> sequentially. Is the data writing time also counted into the running time?
> If yes, from 1 chain to 10 chains, the chain number is increased by 9, the
> time is increased by 246 seconds; from 10 chains to 100 chains, the chain
> number is increased by 90 and the time is increased by 207 seconds. Even if
> we consider the effect of the unstable performance of the CPUs, it's still
> difficult to understand. From 100 chains to 200 chains, it's not even
> increasing.
>
> Is my idea correct? Do you have other explanation?
>
> Thanks,
> Han
>
> --
> You received this message because you are subscribed to the Google
> Groups "QUESO-users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to queso-users...@googlegroups.com.
> Visit this group at https://groups.google.com/group/queso-users.
> For more options, visit https://groups.google.com/d/optout.
> Email had 1 attachment:
> + WeChat Image_20180827160333.png
> 12k (image/png)


--
Damon McDougall
http://dmcdougall.co.uk
Institute for Computational Engineering Sciences
201 E. 24th St., Stop C0200
The University of Texas at Austin
Austin, TX 78712-1229

larryh...@gmail.com

unread,
Sep 6, 2018, 3:15:24 PM9/6/18
to QUESO-users mailing list
Hi Damon,

Thank you for your reply.
No, I haven't used a profiler before, this is a good idea. I'll have a try and see what's going on.

Best regards,
Han
Reply all
Reply to author
Forward
0 new messages