Initial benchmarking

9 views
Skip to first unread message

Paul Hockett

unread,
May 21, 2013, 9:15:21 AM5/21/13
to lima...@googlegroups.com
My first thought is that my first test calculation is taking a long time... still running on 4 CPUs after ~18 hours.  I just took test2.cfg and adapted it for butadiene (file attached for reference) using essentially the same values I was running Christer's code with.  In that case I was looking at around 1 hour for <cos^2> calculation, and maybe 2 hours for a full P(theta), in both cases sampling at 100fs timesteps out to ~100 ps and running on a single core.

So, is this due to (a) inherent additional complexity in your code (all 9 expectation values = order of magnitude longer?), (b) user error in terms of the various coefficients set or something else silly in the input file, (c) something else, like choice of canned ODE routine vs. Christer's code (although I can't imagine this would be an issue these days)?

Any thoughts?

(I should apologise in advance for all references to Christer's code that will appear in benchmarking, I'm not casting any aspersions on you code, but Christer's is - so far - my only point of comparison!  For reference I've attached his code in a vaguely distributable form.)


ffa_code_dist.tar.gz
butadiene_sym_test.cfg

Jonathan Underwood

unread,
May 21, 2013, 10:30:00 AM5/21/13
to Paul Hockett, lima...@googlegroups.com
On 21 May 2013 14:15, Paul Hockett <phoc...@gmail.com> wrote:
> My first thought is that my first test calculation is taking a long time...
> still running on 4 CPUs after ~18 hours. I just took test2.cfg and adapted
> it for butadiene (file attached for reference) using essentially the same
> values I was running Christer's code with. In that case I was looking at
> around 1 hour for <cos^2> calculation, and maybe 2 hours for a full
> P(theta), in both cases sampling at 100fs timesteps out to ~100 ps and
> running on a single core.
>
> So, is this due to (a) inherent additional complexity in your code (all 9
> expectation values = order of magnitude longer?), (b) user error in terms of
> the various coefficients set or something else silly in the input file, (c)
> something else, like choice of canned ODE routine vs. Christer's code
> (although I can't imagine this would be an issue these days)?
>
> Any thoughts?
>

I suspect this is likely due to the precision of the TDSE solver. I
haven't looked at Christer's code yet, so I'm not sure how he's
propagating. You could try changing the 1.0e-6 values to say 1.0e-3
and seeing what speedup you get. 1.0e-6 is asking for high quality
data :).

> (I should apologise in advance for all references to Christer's code that
> will appear in benchmarking, I'm not casting any aspersions on you code, but
> Christer's is - so far - my only point of comparison! For reference I've
> attached his code in a vaguely distributable form.)
>

No problem, it's a sensible thing to do.
>

Paul Hockett

unread,
May 21, 2013, 11:42:08 AM5/21/13
to lima...@googlegroups.com, Paul Hockett
Thanks, I'll try that... a quick look in Christer's code shows he had set an error tolerance (hard coded) on the numerical integration of 5x10^-5.  He was using "cplx_odeint.cpp" from Numerical Recipes as his routine, basic Runge-Kutta method I think, but presumably this tolerance should be more or less equivalent to your tolerance setting.

The first run didn't get the chance to finish since my new OS is, thus far, not as stable as I had hoped (note this crash was unrelated to limapack... but apparently I can't make pngs from Matlab at the moment!).

Paul Hockett

unread,
May 22, 2013, 10:25:19 AM5/22/13
to lima...@googlegroups.com, Paul Hockett
2nd run is now in hour 22... this is again with 4 cores, but with the ODE settings up to 10^-3, ditto for coefmin and poptot (although I'm not sure if these would make much difference or not, guess not?), and Jmax=60.

I'm going to try and let this run finish, just to see what pops out, then maybe look at putting some more diagnostic output in the code just to see what's happening under the hood.

Just as a stupidity check, can you confirm I have the correct units for the inputs?  I have B in cm^-1 and alpha in A^3.  I know when I made an error in Christer's code (in that case cm^-1 instead of GHz) it took around an order of magnitude longer to run.

Jonathan Underwood

unread,
May 22, 2013, 10:26:46 AM5/22/13
to Paul Hockett, lima...@googlegroups.com
On 22 May 2013 15:25, Paul Hockett <phoc...@gmail.com> wrote:
> 2nd run is now in hour 22... this is again with 4 cores, but with the ODE
> settings up to 10^-3, ditto for coefmin and poptot (although I'm not sure if
> these would make much difference or not, guess not?), and Jmax=60.
>
> I'm going to try and let this run finish, just to see what pops out, then
> maybe look at putting some more diagnostic output in the code just to see
> what's happening under the hood.
>

OK, it's possible there's some convergence problem. I'll have a look,
and add some debugging output when i get a bit of time.

> Just as a stupidity check, can you confirm I have the correct units for the
> inputs? I have B in cm^-1 and alpha in A^3. I know when I made an error in
> Christer's code (in that case cm^-1 instead of GHz) it took around an order
> of magnitude longer to run.
>

Yes, the units are correct.

Paul Hockett

unread,
Jun 5, 2013, 2:11:12 PM6/5/13
to lima...@googlegroups.com, Paul Hockett
Hi Jon,

I've been running some more background testing (i.e. blindly running stuff with different settings, as opposed to getting stuck into the code directly) and have a slightly more quantitative perspective of the ODE issues.  I tried some "butadiene" symmetric top settings as before, except T=0.001K and with the ODE solver settings at 0.1.  In all cases I tried the calculation dropped out with population build-up issues. I found:
Jmax=20, run time ~5s
Jmax=40, run time ~1h25
Jmax=60, run time ~18h11

In the output it was clear that the divergence was significant, with the expectation values hitting >1E7 on the timestep before drop-out, and >1E35 on the final time-step.  For Jmax=20 this was t4 (=-0.222ps), and in other cases t5 (=0.222ps).  As expected everything is scaling quite poorly with Jmax.

I briefly looked into getting the ODE solver to spit out some more detailed information, but didn't really get into the problem as I didn't yet have the time or inclination!  Hopefully the above observations are somewhat useful...

p.
Reply all
Reply to author
Forward
0 new messages