How to find the source of an error message in PLUMED

778 views
Skip to first unread message

ikol...@gmail.com

unread,
Mar 26, 2019, 1:11:31 PM3/26/19
to PLUMED users
Hi,

I am seeing a generic error message shown below that I'd like to figure out where is coming from. Does PLUMED have a mechanism to show where a particular exception is originating from? This error message can originate from numerous different places and I don't know where to start.

terminate called after throwing an instance of 'PLMD::ExceptionError'
  what(): 
+++ PLUMED error
+++ at Tensor.h:549, function void PLMD::diagMatSym(const PLMD::TensorGeneric<n_, n_>&, PLMD::VectorGeneric<m_>&, PLMD::TensorGeneric<m_, n_>&) [with unsigned int n_ = 4u; unsigned int m_ = 1u; unsigned int n = 3u; unsigned int m = 3u]
+++ message follows +++
Error diagonalizing matrix
Matrix:
nan nan nan nan nan nan -nan -nan nan -nan nan -nan nan -nan -nan nan
Info: 4

Thanks for any suggestion,

   Istvan

Giovanni Bussi

unread,
Mar 26, 2019, 1:19:17 PM3/26/19
to plumed...@googlegroups.com
Hi,

no unfortunately there isn't, and I agree that this is pretty annoying..

I think we should consider the possibility to use nested exception or, at least, appending the name of the Action before reporting the error.

The latest is not difficult. In file src/core/PlumedMain.cpp replace these two lines:

      if(p->checkNumericalDerivatives()) p->calculateNumericalDerivatives();
      else p->calculate();

with

try {
      if(p->checkNumericalDerivatives()) p->calculateNumericalDerivatives();
      else p->calculate();
} catch(PLMD::Exception & e){
  e<<"Error happened in action "<<p->getLabel()<<"\n";
  throw;
}

Let me know if it works.

Thanks!

Giovanni


--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.
To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/plumed-users/5a90c21a-5812-4ddb-bd78-0282e718287c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ikol...@gmail.com

unread,
Mar 26, 2019, 1:25:04 PM3/26/19
to PLUMED users
Thanks, Giovanni. This is a good start, I'll try it right away. Just wondering, the exception shows the function call with parameters. Could it also show the origin of the call, or would that require nested exceptions as you suggest?

Thanks,

   Istvan

Giovanni Bussi

unread,
Mar 26, 2019, 1:28:51 PM3/26/19
to plumed...@googlegroups.com
In order to follow the stack trace you just need to use
export PLUMED_STACK_TRACE=yes
before running (no change to the code needed, just make sure 'plumed has execinfo' reports 'execinfo on').

However, in this way you will only know the type of Action you were in (e.g.: RMSD). If you have multiple RMSDs and you want to know the instance (i.e. the label) the only way is to append the label name to the message. This is what the code I suggested will do, assuming the error originated during a calculate() call, which is likely the case.

Giovanni


Istvan Kolossvary

unread,
Mar 26, 2019, 1:36:32 PM3/26/19
to plumed...@googlegroups.com
This is very useful, I only have unique actions in this input. I am not familiar with 'plumed has execinfo' option, though. 'plumed --help' does not list it, or anything similar. Is this something I have to set at build time?

Thanks,

   Istvan

Giovanni Bussi

unread,
Mar 26, 2019, 1:37:50 PM3/26/19
to plumed...@googlegroups.com
Sorry... it's "plumed config has execinfo"

Giovanni

Istvan Kolossvary

unread,
Mar 26, 2019, 1:39:53 PM3/26/19
to plumed...@googlegroups.com
Oh, it is 'on' :)

Thanks, I'll rerun the job and see if I can get closer to the source of the error.

   Istvan

ikol...@gmail.com

unread,
Apr 4, 2019, 3:44:07 PM4/4/19
to PLUMED users
Hi, Giovanni,

Thanks again for your suggestion with the trace, it did help to identify where the problems started. For some reason every single coordinate value in the simulation associated with every single atom in the PATH reference file is a NaN and this is why the 16 matrix elements of the ro-translational matrix used in RMSD are NaN also. However, now the question arises how/where these NaNs originated. It is a chicken and egg problem, did somehow the coordinates get corrupted and as a result blew up the simulation, or, did the simulation somehow go haywire and produced NaN coordinates? In any case, my question is whether PLUMED has a mechanism to throw an exception and locate the line in the code where the first occurrence of NaN happens. I have used this tool in the past https://www.dursi.ca/post/stopping-your-program-at-the-first-nan.html successfully but it is not clear how I would implement it in PLUMED. I mean, I can add the 'feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);' call to PlumedMain.cpp, I guess, but then how would I hook up the OpenMM-PLUMED job to a debugger? Do you have any suggestion?

Many thanks,

   Istvan
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users+unsubscribe@googlegroups.com.

Snow Summer

unread,
Apr 4, 2019, 11:50:35 PM4/4/19
to PLUMED users
Hi, Istvan,

I think you can add the line in any functions that are called at running. You don't need to attach the program directly to debugger, and what you need is a core dump. To generate core dump, you can firstly run the two commands
sudo sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t
ulimit
-c unlimited
where the first line tells the kernel to save core dump files into /tmp and the second line allows unlimited core dump file size.
After that you can run openMM+plumed with the floating-point exception line added and debug information compiled in (-g option when compiling), and wait for it throwing an NaN exception and stopping. After the program stops, you can run
gdb -c /tmp/<your-core-dump-file>
to debug and backtrace it.

Best regards,
Haochuan Chen

在 2019年4月5日星期五 UTC+8上午3:44:07,ikol...@gmail.com写道:
Hi, Giovanni,

Thanks again for your suggestion with the trace, it did help to identify where the problems started. For some reason every single coordinate value in the simulation associated with every single atom in the PATH reference file is a NaN and this is why the 16 matrix elements of the ro-translational matrix used in RMSD are NaN also. However, now the question arises how/where these NaNs originated. It is a chicken and egg problem, did somehow the coordinates get corrupted and as a result blew up the simulation, or, did the simulation somehow go haywire and produced NaN coordinates? In any case, my question is whether PLUMED has a mechanism to throw an exception and locate the line in the code where the first occurrence of NaN happens. I have used this tool in the past https://www.dursi.ca/post/stopping-your-program-at-the-first-nan.html successfully but it is not clear how I would implement it in PLUMED. I mean, I can add the 'feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);' call to PlumedMain.cpp, I guess, but then how would I hook up the OpenMM-PLUMED job to a debugger? Do you have any suggestion?

Many thanks,

   Istvan


On Tuesday, March 26, 2019 at 1:39:53 PM UTC-4, Istvan Kolossvary wrote:
Oh, it is 'on' :)

Thanks, I'll rerun the job and see if I can get closer to the source of the error.

   Istvan

To unsubscribe from this group and stop receiving emails from it, send an email to plumed...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed...@googlegroups.com.

Istvan Kolossvary

unread,
Apr 5, 2019, 8:24:27 AM4/5/19
to plumed...@googlegroups.com
Thank you, Haochuan. This sounds very doable, I'll give it a try.

Best,

   Istvan

Reply all
Reply to author
Forward
0 new messages