problem converging when taking very small time steps

814 views
Skip to first unread message

Jesse Carter

unread,
Jun 17, 2015, 10:28:01 AM6/17/15
to moose...@googlegroups.com
I have a Transient Diffusion problem with constant time stepping (ConstantDT) that I have reduced to one variable in 1D for debugging purposes, so it's a linear problem (right?). Upon attempting to take a very small time step (dt=~1e-10). Linear and non-linear residuals start to come down but when attempting to compute the 2nd or 3rd non-linear residual, -snes_converged_reason says it diverged during the (default) line search. Using -snes_linesearch_monitor it appears to attempt to shrink lambda but every time it fails until it reaches the minimum lambda.

Upon changing the line search method (or turning it off) either I get the same result or get into a situation where the linear problem converges but the non-linear residual, after the 2nd or 3rd iteration, remains the same and never comes down after successive iterations. Eventually the (default) max non-linear iteration count gets hit, which triggers the dt to get cut, but as we know the problem doesn't do good with small time steps, so smaller time steps don't work, and eventually dtmin gets hit and the solver stops. The initial residual is not so small that I run into machine precision problems.

For what it's worth, increasing my diffusion coefficient say by an order of magnitude allows me to take an order of magnitude smaller time step, so something seems to be tied to how much the solution wants to "move" in that time frame. For my IC I'm using a simple Gaussian centered on my domain which has periodic BC's. I've probably left out something so please let me know if more information is needed.

Any ideas how to proceed? Is there some fundamental thing about taking small time steps that I'm not aware of?

Thanks for your help.

    - Jesse

Dykhuis, Andrew F

unread,
Jun 17, 2015, 10:39:59 AM6/17/15
to moose...@googlegroups.com

Jesse,

 

There is a relationship among time step size, diffusion coefficient, and how far things will move (a lengthscale). Picture it this way, if you made the diffusion coefficient really, really, really big, you would get a lot of diffusion, so a smaller time step would then capture the same amount of diffusion as a smaller diffusion coefficient with a really, really, really big time step. An estimate for a characteristic length for diffusion is l = sqrt(4*D*t) , to give you an idea of how things are generally related. So to help with your problem, I would recommend using some adaptive time stepping, or continuing to find what time steps and simulation parameters you need to get things running (what are you modeling? What are the typical values for diffusion coefficients in that space? I would change dt and mesh size to fit the diffusion coefficient, since that’s what based on reality). I’ve run some 1D diffusion simulations and constant time steps can sometime cause problems, especially if resolving the initial profile is difficult. There’s also a mesh size component as well…you might make your elements smaller, but keep an eye on what size they actually are to make sure things are still actually physical.

 

I hope that helps, and I’m sure some other users can chime in on the nuts and bolts of what you mentioned in more detail, if necessary.

 

Andrew

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users...@googlegroups.com.
Visit this group at http://groups.google.com/group/moose-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/moose-users/8255316b-9acc-4d00-9ba3-5d4ed8cc3e2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



This e-mail may contain proprietary information of the sending organization. Any unauthorized or improper disclosure, copying, distribution, or use of the contents of this e-mail and attached document(s) is prohibited. The information contained in this e-mail and attached document(s) is intended only for the personal and private use of the recipient(s) named above. If you have received this communication in error, please notify the sender immediately by email and delete the original e-mail and attached document(s).

Jesse Carter

unread,
Jun 17, 2015, 12:47:29 PM6/17/15
to moose...@googlegroups.com, dykh...@westinghouse.com
I agree the concept of characteristic length comes into play here. If I use a diffusion coefficient D=1, I can't go much lower than dt=3.4e-11. If I up the D to 10, I can't go much lower than 3.4e-12. In fact Dt is the same for both situations. It's the same when I tried D=0.1,0.5,1,5, and 10. Increasing the mesh density by a factor of 10 changed the critical Dt by less than a percent.

However, I feel I might not understand something in the implementation. For instance, for D=1, dt=3.41e-11 converges very quickly (as you would expect, not much as changed) but at d=3.40e-11, the line search diverges or the non-linear residual stops decreasing after  after a few iterations. I would have expected some middle ground where the solver maybe has a hard time at some intermediate time step. But the situation is either it converges in about 2 iterations or it doesn't converge at all when the time step changes from 3.41e-11 to 3.40e-11.

A co-worker and I were able to fairly easily implement the model in MATLAB also using implicit euler time stepping. We did it the "traditional" way by computing the full Jacobian and inverting it to find the next update to the solution. Since it's a linear problem with such a small time step, the algorithm reduces the residual to something on the order or 1e-12 in just one step with these sort of time steps. The problem doesn't change much, sure, but it does converge.

In MOOSE, it seems to be doing the linear problem fine, but whatever update it gets from that step doesn't seem to be advancing the non-linear (though it's linear here) solution much or at all after the first couple of iterations.

Andrea Jokisaari

unread,
Jun 17, 2015, 2:24:28 PM6/17/15
to moose...@googlegroups.com
I've been following this and I admit I don't quite understand why you require such a small dt - are your residuals huge/is the driving force for diffusion huge?  I think people might be able to give you more feedback if you provide more specific details of the physics problem you are trying to model. 

It's kind of asinine, but I'm wondering if you got your implementation correct.  Have you tried solving the problem using the finite difference preconditioner (using a very small problem)?  If your problem doesn't solve well, then you likely did something wrong in your implementation.  More info found here: http://mooseframework.org/wiki/MooseSystems/Preconditioners/

hth,
Andrea

Daniel Schwen

unread,
Jun 17, 2015, 3:00:23 PM6/17/15
to moose...@googlegroups.com

Jesse Carter

unread,
Jun 17, 2015, 3:03:04 PM6/17/15
to moose...@googlegroups.com
I had reduced my problem to the bare minimum to try and pinpoint where things were wrong and at that point it wasn't really physical any more. At this bare-bones implementation, the only kernels are diffusion and a time derivative that are already in MOOSE (so I don't screw anything up).

What I'm really doing is modeling diffusion in metals. If you were to take a diffusion coefficient that is typical for my problem, something like 1e-9 [cm^2/s], then due to the critical Dt I mentioned above, I can't take a time step that is less than 0.03 seconds. I suppose I could reformulate the problem with different units (because really not much is going to happen in that timespan), but it just seemed odd to me that there was a "cliff" where just a little past some critical value, the solver goes from quick convergence to none.

I did try finite difference preconditioner on this problem and the default options give me the same results.

Jesse Carter

unread,
Jun 17, 2015, 3:08:38 PM6/17/15
to moose...@googlegroups.com
# These kernels have guaranteed correct Jacobians
whitelisted_kernels = ['Diffusion', 'TimeDerivative']


I reduced my problem to the bare minimum so I wouldn't have to debug!

Indeed the output was "no errors detected".

Andrea Jokisaari

unread,
Jun 17, 2015, 3:10:01 PM6/17/15
to moose...@googlegroups.com
Ooh those two pages should be linked together!

Andrea Jokisaari

unread,
Jun 17, 2015, 3:17:46 PM6/17/15
to moose...@googlegroups.com
I'd recommend using different units.  This doesn't answer your question, but I find the convergence behavior is much better when you're not dealing with massive ranges in order of magnitude (I model diffusion/precipitation processes). 

Andrea Jokisaari

unread,
Jun 17, 2015, 3:19:35 PM6/17/15
to moose...@googlegroups.com
I've linked the FDP section in the preconditioners page to the Jacobian debugging page.  Wikis are great !

Andrew....@csiro.au

unread,
Jun 17, 2015, 4:56:45 PM6/17/15
to moose...@googlegroups.com
(1) How about precision loss?  Can you roughly estimate your residual terms to check with such tiny timesteps you’re not  getting precision loss, so no decrease in the residual.  Eg:

(u - u_old)/dt + diff_coeff*del_squared(u) = 0

Potentially the first term will be enormous with small dt.  Unless your diff_coeff is suitable large, the second term might be effectively zero when added to the first term (if it is <1E-15*first term).  Matlab might not see this precision loss as it might carry around a zillion digits of precision.  Before you do any detailed analysis you can just add a few Moose::out statements to your computeQpResidual.

(2) Are you running using NEWTON ?   Then a linear problem with the correct jacobian should typically converge in one step.

a


Tonks, Michael R

unread,
Jun 17, 2015, 8:33:32 PM6/17/15
to moose...@googlegroups.com
I would also recommend changing your units of time to something smaller, like nano-seconds. I think the problem is just you are dealing with too small of numbers. All numerical algorithms don't work as well with very small numbers, and I always try to make the problem easy for the algorithms to solve. 


For more options, visit https://groups.google.com/d/optout.



--

*************************

Michael R. Tonks

Fuel Modeling and Simulation Department

Idaho National Laboratory

208-526-6319

michae...@inl.gov

Derek Gaston

unread,
Jun 18, 2015, 9:07:29 AM6/18/15
to moose...@googlegroups.com
Mike and Andrew have the right answer: precision loss.

A Krylov solver is more sensitive to precision loss because it depends on orthogonalizing multiple vectors against each other to come up with a new direction... precision loss will keep you from getting a good direction.  Jacobian-Free-Newton-Krylov (JFNK) is even more sensitive because it's using finite differences to get the action of the Jacobian on that same vector... and precision loss can mean that you can't do that precisely either.

Reformulating so your timestep is around ~1 second will definitely help.

If you want to try to match up with Matlab, then make sure your Jacobian entries all are all perfect, use SMP with "full = true" and put this stuff in your Executioner block:

[Executioner]
  ...
  solve_type = Newton  # Use "regular" Newton's method (you're Jacobian MUST be correct)
  petsc_options_iname = '-pc_type'  # LU is essentially a "direct solve"
  petsc_options_value = 'lu'
  line_search = none # Don't allow line search to cut back the steps
[../]

You should be able to get the same behavior as your handcoded Matlab stuff.

Note: This set of options is for debugging... it doesn't make any sense to try this in a real problem with complicated Jacobians and big mesh.

Derek


Jesse Carter

unread,
Jun 18, 2015, 10:50:20 PM6/18/15
to moose...@googlegroups.com
It's kind of interesting as to exactly why I'm inquiring about such small time steps...

I was running my model with lots of time steps to check the sensitivity. I had a case where I was running with end_time=10,000 and dt=1.

Sometime around timestep 2000, some sort of round-off error creeps in and the output starts looking like this:
t=2000
t=2001
t=2002
t=2003
t=2003.999999999999
t=2004.999999999999
and so on...

So when it gets to the end, it is at t=9999.9999998 or something and to get to t=10,000, MOOSE tries to take a timestep like 1e-10 or so and that brings on the convergence errors. Has anybody else seen this behavior?

Heinonen, Olle G.

unread,
Jun 18, 2015, 11:01:58 PM6/18/15
to moose...@googlegroups.com
This is not an uncommon phenomenon - it depends on how the code converts an index (some kind of integer) to the timestep (a floating point number) and how the time increment is calculated (what kind of conversions). Usually I set an exit condition to be t >= 10000 (in this case) so as not force the last timestep to be ridiculously small.

-Olle

Olle Heinonen, Materials Scientist

Argonne National Laboratory, bldg 200, 9700 South Cass Ave. Lemont, IL 60439

hein...@anl.gov, Tel +1 630 252 4877


From: moose...@googlegroups.com [moose...@googlegroups.com] on behalf of Jesse Carter [jesse....@gmail.com]
Sent: Thursday, June 18, 2015 9:50 PM
To: moose...@googlegroups.com

Andrew....@csiro.au

unread,
Jun 18, 2015, 11:10:57 PM6/18/15
to moose...@googlegroups.com
You are right, Olle, however I think this current example merits further investigation.  When a user specifies dt=1, exactly, they expect that the timesteps are t = (0, 1, 2, 3, …), exactly.  If a user had specified dt=1.23456789012345 then they’d probably understand if the last digits got altered, or got ignored if t itself was too large, for example.

Jesse, MOOSE provides lots of TimeSteppers and so on which can control this sort of thing.  Have a look at the doco (eg moose-opt —dump) and have fun experimenting, or ask the list.

a


Derek Gaston

unread,
Jun 22, 2015, 10:45:01 AM6/22/15
to moose...@googlegroups.com
This is definitely strange behavior.  I have no idea why you would start to get weird increments like that.

What TimeStepper are you using?

Also: There is an option called "timestep_tolerance" that you can use to keep from taking tiny time steps.  By default it is set to 1e-14.  You can set it in your Executioner block to whatever you want (like 1e-8 or 1e-9) and what will happen is if you fall within that tolerance of the final timestep it will consider the simulation to be "done".

Derek

Jesse Carter

unread,
Jun 22, 2015, 10:50:21 AM6/22/15
to moose...@googlegroups.com
Using the default, which I assume is ConstantDT.

Also I saw in the documentation that the "timestep_tolerance" parameter was for matching up the current time with sync times. I didn't explicitily set any sync times in my input file, so does MOOSE assume either end_time or num_steps*dt is a sync time?

Derek Gaston

unread,
Jun 22, 2015, 10:54:27 AM6/22/15
to moose...@googlegroups.com
Which documentation?  On the syntax page ( http://mooseframework.com/docs/syntax/moose/ ) it says: "the tolerance setting for final timestep size and sync times".  If it says something different somewhere else... then we need to fix that.

The final time is not treated as a sync time, per se.  You can actually see the logic right here: https://github.com/idaholab/moose/blob/devel/framework/src/executioners/Transient.C#L591

That's not the greatest line of code... but you can see what it's doing easily enough :-)

Derek


Jesse Carter

unread,
Jun 22, 2015, 1:20:04 PM6/22/15
to moose...@googlegroups.com
That's the documentation string I was referring to. The way I read it, I had thought the timestep_tolerance was used for making a comparison between end_time and sync times. Now I see the tolerance is for the current time and those things individually. Thanks for helping clarify that.

Jesse Carter

unread,
Jun 22, 2015, 1:36:41 PM6/22/15
to moose...@googlegroups.com
I'd like to test this, so is there a simple way to compute and print the individual contributions of each kernel to the overall residual? Would it be a matter of making a postprocessor for each kernel that inherits from ElementIntegralVariablePostprocessor and overrides computeQpResidual? Test functions and all? Or could you just get away with e.g. _u_dot[_qp] for TimeDerivative or similarly _grad_u[_qp] for Diffusion?

Andrew....@csiro.au

unread,
Jun 23, 2015, 1:04:00 AM6/23/15
to moose...@googlegroups.com

Hey Jesse,

 

Look at the “save_in” option for the Kernels.  It looks like it’ll do what you want.

 

a

 

 

Ph: +61 7 3327 4497.  Fax: +61 7 3327 4666
Queensland Centre for Advanced Technologies
PO Box 883, Kenmore, Qld, 4069 
 

Jesse Carter

unread,
Jun 23, 2015, 10:59:38 AM6/23/15
to moose...@googlegroups.com
Of course there's already a command for that! Thanks Andrew.

Now I need to figure out how to use it. Preliminary attempts have failed so I thought I'd ask. In theory, the non-linear residual that is written to screen has components from each kernel, right? So again, in theory, if one applies the right Postprocessor to the (Nodal?) AuxVariables that hold the "save_in" value, one would get some values that when added, would give the non-linear residual. Assuming this thinking is correct, what would those Postprocessor operations be? A simple sum across all nodes (NodalSum)? Or is the residual computed in an absolute value sense and a NodalL2Norm is needed? After messing around with this a bit the answer seems to be "something else".

Again I'm just doing a simple transient 1D diffusion problem so there is only a TimeDerivative kernel and a Diffusion kernel. I'd like to see that if under certain conditions (small timestep), the contributions of the two residuals differ by something in the neighborhood of 1e15 like the suggestion from Andrew above (i.e. why does dt=3.41e-11 converge but dt=3.40e-11 does not). Also I'm using Periodic BC's, which I don't think adds anything to the residual like an IntegratedBC would.

But then again, my thinking about all this could be way off so feel free to correct me.

Thanks for sticking with me!

   - Jesse

Derek Gaston

unread,
Jun 23, 2015, 11:42:13 AM6/23/15
to moose...@googlegroups.com
I think that just NodalSum over the aux field produced by using "save_in" would get it done for Linear Lagrange elements.

Are you saying that's not working?

Derek

Jesse Carter

unread,
Jun 23, 2015, 2:16:30 PM6/23/15
to moose...@googlegroups.com
The AuxVariables are Linear Lagrange and I'm using a NodalSum postprocessor. In one particular case (small timestep), the TimeDerivative kernel NodalSum is -6.282e-7 and the Diffusion kernel NodalSum is -7.861e-16. The last nonlinear residual is 2.988e-7, which is close to the magnitude of the sum of the postprocessors (though the wrong sign, and obviously the Diffusion kernel doesn't have much of a contribution), but it doesn't line up exactly. What else contributes?

Derek Gaston

unread,
Jun 23, 2015, 2:24:49 PM6/23/15
to moose...@googlegroups.com
Oh - I see what you're saying.  You're trying to match up with what comes out of the nonlinear residual?

What you want to use is NodalL2Norm.  Applied to just one of the "save_in" fields it will give you the "residual norm" for just that term.

To match the output for the nonlinear residual you would need to apply it to a "save_in" field that is the sum of the two Kernel's residuals.  The save_in parameter can take multiple names and save off pieces to multiple auxiliary variables like so:

save_in = 'kernel1 total'

That would save that Kernel's residual to BOTH auxiliary variables named "kernel1" and "total"... so if you tell both of your Kernel's to save to individual fields AND both to save to something called "total" then you can run the NodalL2Norm on the "total" auxiliary variable and it should produce the same numbers at what's coming out of the nonlinear residual.

The L2 Norm BTW is the square root of the sum of the squares: http://mathworld.wolfram.com/L2-Norm.html

Derek

Jesse Carter

unread,
Jun 24, 2015, 10:58:10 AM6/24/15
to moose...@googlegroups.com
Ah that makes sense. I saw something to that effect in one of the MOOSE tests (tests/misc/save_in).

So I implemented this and it is interesting that for my problem (at least for one case I tested), the L2Norm of the contributions to the residual from the TimeDerivative and Diffusion kernels are essentially the same (and close to the 0th Nonlinear residual), If fact, when viewing the "save_in" AuxVariables directly as a function of position, they appear to be equal but opposite. Is that intentional or just a coincidence?

Derek Gaston

unread,
Jun 24, 2015, 3:23:25 PM6/24/15
to moose...@googlegroups.com
If you view only the final converged residuals they will be exactly equal and opposite. If you think about it, they have to be (there are only two terms and they have to balance each other to make the residual zero).
You're getting closer to the "true" way of seeing the residual... as a balance statement. Without the time derivative the diffusion term would be free... and the species would diffuse until the second derivative is zero (i.e. flat)... Instantly. Without the diffusion term the time derivative wouldn't allow the solution to evolve at all. Together they balance each other. The time derivative "holds back" the diffusion. A larger time derivative would therefore hold it back more.... a larger coefficient on the diffusion term would cause more diffusion... etc.
It's also possible to see all of the intermediate residuals (so you can watch them evolve over time)... but I can't remember how... Maybe Andrew Slaughter can fill us in....
Derek

Cody Permann

unread,
Jun 24, 2015, 3:28:18 PM6/24/15
to moose...@googlegroups.com
Since Andrew is in the middle of MOOSE training I'll jump in with the answer (I hope). You can simply set up an Outputter to create files at each linear or nonlinear step by setting the "execute_on" parameter accordingly (i.e. linear, nonlinear).


Slaughter, Andrew E

unread,
Jun 24, 2015, 4:19:08 PM6/24/15
to moose...@googlegroups.com

Jesse Carter

unread,
Jun 25, 2015, 12:56:42 PM6/25/15
to moose...@googlegroups.com
Thanks Derek for that explanation. That makes sense. Earlier when I was having convergence issues with very small timesteps, it was suggested that perhaps, since the TimeDerivative term would have such a small number in the denominator, that term in the residual would "overpower" contributions from the diffusion term, but now I can see that the contributions are (almost) equal.

So I still have the issue where after the first iteration or so, the non-linear iterations do not come down any further when using very small time steps. The linear iterations converge to a very small residual (|R| = ~1e-22). My interpretation is that whatever update (du) it is passing to the outer loop does not change the solution much. Are the linear iterations essentially giving back an update that is so small relative to the solution that u+du = u? Can we see what that du looks like or is that all happening in PETSc? Furthermore, I am doing Newton solves with no line search. If I turn on the line search, I either get the same behavior, or the line search quits after cutting lambda down to the minimum tolerance and just comes back as "diverged".

For now I can get around these issues by changing units or setting a nonlinear absolute tolerance and also maybe some damping, but I'd still like to know (for educational purposes) why I can go from converging in one iteration to non-convergence when I change my timestep by the slightest amount. Again I'm doing Newton solves and on one CPU so I don't understand how precision loss would be a factor.

Thanks for sticking with me through this.

   - Jesse

Alexander Lindsay

unread,
Jun 26, 2015, 2:49:24 PM6/26/15
to moose...@googlegroups.com
Jesse, were you able to get postprocessor output at every non-linear iteration? I'm trying to do something similar, and my postprocessor values derived from my save_in kernels only reflect the 0th non-linear residual. I'm not getting any other postprocessor values for other non-linear iterations. I'm wondering whether I need to output the postprocessors to CSV
...

Alexander Lindsay

unread,
Jun 26, 2015, 3:38:34 PM6/26/15
to moose...@googlegroups.com
If I execute_on and output_on nonlinear, here's a sample of what I get in a csv output. t0 = 0, t1 = 2e-11, t2 = 6e-11, t3 = 1.4e-10... The "intermediate times" (between t0 and t1: 2e-14, 4e-14,...) correspond to the non-linear iterations. Residuals clearly aren't changing :-(

Derek Gaston

unread,
Jun 30, 2015, 8:47:31 AM6/30/15
to moose...@googlegroups.com
When do you have your Postprocessor set to "execute_on"?  You need to make sure it's at least set to "nonlinear".

Derek

Dmitry Karpeyev

unread,
Jun 30, 2015, 9:02:42 AM6/30/15
to moose...@googlegroups.com
Sorry for joining this thread late.
If nonlinear convergence is still an issue, I'm with Daniel Schwen: 
the first thing to do is make sure the Jacobian is correct, since 
it appears that the search direction the linear solve generates doesn't
result in the reduction of the residual.

I would recommend using -snes_check_jacobian and, possibly, -snes_check_jacobian_display.
Use this on the smallest problem size that exhibits the problem.

Dmitry.

Derek Gaston

unread,
Jun 30, 2015, 9:14:28 AM6/30/15
to moose...@googlegroups.com
On Thu, Jun 25, 2015 at 12:56 PM Jesse Carter <jesse....@gmail.com> wrote:
Thanks Derek for that explanation. That makes sense. Earlier when I was having convergence issues with very small timesteps, it was suggested that perhaps, since the TimeDerivative term would have such a small number in the denominator, that term in the residual would "overpower" contributions from the diffusion term, but now I can see that the contributions are (almost) equal.

I just noticed this statement.  Realize that one term can "overpower" the other one.  The contributions are only "equal" after the solve has converged.  At the very beginning of a solve it could be the case that one is many orders of magnitude greater than the other... which can cause problems.

Alexander Lindsay

unread,
Jun 30, 2015, 10:58:12 AM6/30/15
to moose...@googlegroups.com
It is/was set to "nonlinear"
Reply all
Reply to author
Forward
0 new messages