After spending certain amount of time searching and testing with my MODFLOW-2005 simulation, it is very likely I would see some light for this question.
Here is my understanding and result:
The MODLFOW needs to solve the equations, could be large or small number of, depend on the spatial discretization. In my case, the grid is about 2000*1500*5 and this could be a problem.
Different solver have different mathematical method to solve these equations, and I mainly use PCG and GMG. Therefore my comment may not be suitable for other solvers.
I haven't completely examined the source code, so I assume there are two scenarios which the failure could be triggered:
Another one is even the overall discrepancy is around 1%, but the outer or inner iterations actually haven't meet the convergence criteria.
In order to avoid these two types of failure, the solver must converge within limited outer and inner iterations. And the overall discrepancy needs to less than 1%.
For the rewetting part, seemly in order to avoid the oscillation in dry-wet conversion, the inner iteration should not be too high. As well the threshold could be increased a little bit.
The outer iteration may approach 100 or even higher.
In my case, I feel the iteration settings are coupled with the RCLOSE/HCLOSE and also the grid size(The matrix A for PCG solver)
The smaller the RCLOSE/HCLOSE you set up, the more iteration you may need to reach the convergence, and the more computational power and time you might need. (My simulation under a Windows HPC with 100G physical memory ended with
insufficient virtual memory, and I have to use Linux cluster to run the model) Surely you want to have small tolerance so that the discrepancy are small enough for your application.
Therefore, you have to do some test, that is tolerances aren't too small and the iterations aren't too large, but they can converge for the solver.
Other parameters including the DAMP and RELAX also may help to speed up the convergence. I wish the HPC could be more convenient and powerful for me do more tests. (Currently, I need one hour to run one stress period)
And also good practice to check the in-out result in the LIST file, which give you the direction which part might be wrong. (High hydraulic conductivity and related parameters also lead to problems if not set up appropriately)
Also thanks for the suggestions from @An Ho Antonio Taylor and @Fabian Nick.