Hi All,
There are a lot of ways in which the performance and stability of the 1D solver could be improved. The current implementation basically follows the solution algorithm of the original PREMIX and OPPDIF codes, which are about 15 years old at this point. These codes work well for mechanisms with relatively small numbers of species, but don't scale well to the much larger mechanisms that many people are now interested in. The time complexity for factorizing the banded Jacobian is O(N*K^3) time where N is the number of points and K is the number of species. As a point of comparison, I did a bit of profiling, and for the mechanism that Lee posted (316 species), this means that over 50% of the time is spent on this factorization, as compared to only 19% for solving with GRI Mech 3.0 (53 species). Also, as noted here, the memory requirement scales as O(N*K**2), which can be problem as well.
Using an optimized BLAS/LAPACK implementation will certainly help some, but all that does is reduce the constant multiplier on the time when what you'd really like is to reduce the exponent on the species term, which means getting rid of the dense/banded factorization.
The sparse preconditioned Krylov methods like GMRES are great, but you still need to form a preconditioner, which effectively brings you back to the problem of building and factorizing the Jacobian. These methods will tolerate some approximations in the Jacobian that you can't get away with when using a direct solver, which is why its possible to consider an inexact factorization that exploits the sparsity of the Jacobian. In my experience, this these can work well, but they're not exactly easy to implement, and it can be difficult to know how accurate the Jacobian needs to be to get acceptable solver performance. The paper Nick linked to is a pretty good summary of this case. One additional concern with trying to use a sparse solver in the 1D problem is that the transport-related terms in the Jacobian may generate a lot of additional fill-in that reduces the benefit of using the sparse solver.
I don't think that the first-order nature of the transient solver should be a source of convergence trouble, given that the stability regions of the higher-order BDF formulas are smaller than the first-order method. That said, it might be interesting to see whether using Sundials to provide the higher-order methods would improve the overall rate of convergence. Of course, the convoluted relationship among Sim1D/OneDim/MultiNewton/etc. would make this a nontrivial undertaking.
Using higher-order spatial derivatives is an interesting idea. One problem is that this would increase the bandwidth of the Jacobian, assuming you're working with the existing linear solver.
Of course, another option is to step away from the fully implicit solver entirely, which is what I decided to do with my time-dependent solver, Ember. By going to an operator-split method, you can implement different solvers for each term, and the individual problems are much simpler. If you stick with the direct solver for the reaction terms, you only need O(K^2) memory for the Jacobian (no dependence on N), since you're only solving one point at a time. And if you want to use a sparse solver, you are solving almost exactly the problem described in the McNenly et al. paper. I think that implementing higher-order spatial derivatives would be much easier in an operator-split code as well, since there's better separation of each of the terms in the original equations, rather than the monolithic approach that you usually end up with with the fully implicit approach. Additionally, operator splitting provides a very natural opportunity for implementing parallelism.
Regards,
Ray