Modifying shape function data in MatrixFree

David

unread,

Dec 29, 2020, 6:41:35 AM12/29/20

to deal.II User Group

Hi all,

I'm currently trying to implement a vectorized variant of the residual assembly as it is done in step-44/one of the corresponding code-gallery examples using FEEvaluation objects in combination with matrix-free.

I was not able to find an appropriate solution for the given code line and the subsequent application for the assembly process: the major problem is that the shape function gradients are defined with regard to the current configuration, whereas my matrix-free object holds a mapping, which refers to the initial configuration. The reference configuration mapping of my matrix-free object is desired as the final integration is actually performed in the reference frame.

A valid option is probably (I have not tried it) to use a matrix-free object with a different mapping (MappingQEulerian) which directly evaluates the shape gradients in the deformed configuration and use another referential matrix-free object for the integration part.

The main reason to avoid this approach is that it would require a reinitialization of the 'deformed' matrix-free object in each nonlinear step and the mapping needs to be recomputed in each reinizialization. On the other hand, the expensive reinitialization is not really required as the mapping between the referential and the current configuration is known due to the (inverse) deformation gradient.

Therefore, I'm looking for an opportunity to access the shape gradients and perform a push forward similarly to the approach in step-44 in order to evaluate the desired quantities.

Until now I was not able to find an obvious solution for this computations using the FEEvaluation class. Maybe anyone else has a better idea for the described problem.

Thanks in advance,

David

unread,

Dec 29, 2020, 9:59:54 AM12/29/20

to deal.II User Group

Maybe as an edit: what I currently do looks the following way:

```

// Get gradient in reference frame

const Tensor<2, dim, VectorizedArrayType> grad_u = phi.get_gradient(q_point);

// Compute deformation gradient

const Tensor<2, dim, VectorizedArrayType> F = Physics::Elasticity::Kinematics::F(grad_u);
const SymmetricTensor<2, dim, VectorizedArrayType> b = Physics::Elasticity::Kinematics::b(F);

const VectorizedArrayType det_F = determinant(F);

// Invert F

const Tensor<2, dim, VectorizedArrayType> F_inv = invert(F);
const SymmetricTensor<2, dim, VectorizedArrayType> b_bar = Physics::Elasticity::Kinematics::b( Physics::Elasticity::Kinematics::F_iso(F));

// Get tau from material description
SymmetricTensor<2, dim, VectorizedArrayType> tau; material->get_tau(tau, det_F, b_bar, b);

// Gradient itslef is included in the integrate call, apply push forward with F^{-1}

const Tensor<2, dim, VectorizedArrayType> symm_grad_Nx = symmetrize(F_inv);

// Compute the result

const Tensor<2, dim, VectorizedArrayType> result = symm_grad_Nx * Tensor<2, dim, VectorizedArrayType>(tau);

// Apply an additional push forward with F^{-T}

phi.submit_gradient(-result * transpose(F_inv), q_point); // end loop over quadrature points

// For each element

phi.integrate(false, true);

```

It works, but I don't get the quadratic convergence order of the Newton scheme anymore. Hence, I assume this is not sufficient.

Jean-Paul Pelteret

unread,

Dec 29, 2020, 2:06:17 PM12/29/20

to dea...@googlegroups.com

Hi David,

So as I'm sure that you know, if you want to assemble the linear system for quasi-static non-linear (hyper)elasticity with a referential DoFHandler (using no Eulerian mapping to transform the shape functions to the spatial setting) AND without doing a similar transformation by hand (as step-44 does) then you are pretty much limited to using one of two parameterisations. Either you have to choose the (two-point) Kirchoff stress (P) and deformation gradient (F) as the work conjugate pairs, or the (fully-referential) Piola-Kirchhoff stress (S) and the Green-Lagrange strain tensor (E = 0.5(C-I) with C being the right Cauchy-Green tensor).

Thinking about just the residual term: to the best of my knowledge (and from what I recall from the work that we did here), the variation dC = symm(F^{T}.dF) = symm(F^{T} d(Grad(u))) -- with u representing the displacement -- cannot be framed a way that the matrix-free framework supports: the test function cannot be pre-multiplied by a field variable. One would have to somehow rephrase the whole problem such that you end up with just the test function d(Grad(u)) as a pre-multiplication to your stress variable, and in the end you come up with the exactly the other parameterisation because P = F.S and dF = d(I + Grad(u)) == d(Grad(u)). So you may as well start off by parameterising the problem in terms of F and P, and naturally you have to adjust the linearisation as well. The linearisation of the two-point tensor P contains both the material and geometric tangent (you can identify each by linearising the decomposition P = F.S to get these two quantities). I get the sense from what I see in the code that you've shared you're trying to accomplish exactly this (although I'm not sure its quite right, since tau = P.F^{T} --> P = tau.F^{-T}, and you seem to have some extra manipulation involving what you call symm_grad_Nx -- this looks a bit suspect to me).

I have some code to share that will do the transformation from the stress and tangent quantities already used in step-44 to those that you need for this F-P parameterisation. You can put these in the PointHistory class

  template <int dim>
  class PointHistory
  {
...

    // Fully referential configuration
    SymmetricTensor<2, dim>
    get_S() const
    {
      return Physics::Transformations::Contravariant::pull_back(get_tau(), F);
    }
    SymmetricTensor<4, dim>
    get_H() const
    {
      return Physics::Transformations::Contravariant::pull_back(get_Jc(), F);
    }
    // Mixed / two-point configuration
    Tensor<2, dim>
    get_P() const
    {
      return get_F() * Tensor<2, dim>(get_S());
    }
    Tensor<4, dim>
    get_HH() const
    {
      // Reference: Simo1984a eq 4.4
      //   Simo, J. C. and Marsden, J.
      //   On the rotated stress tensor and the material version of the {Doyle-Ericksen} formula
      //   DOI: 10.1007/BF00281556
      const SymmetricTensor<2, dim> S = get_S();
      const SymmetricTensor<4, dim> H = get_H();
      // Push forward index 2 of material stress contribution
      // This index operation relies on the symmetry of the
      // last two indices, so therefore (in case it makes a
      // difference) we transform it first.
      Tensor<4, dim> tmp1;
      for (unsigned int I = 0; I < dim; ++I)
        for (unsigned int J = 0; J < dim; ++J)
          for (unsigned int k = 0; k < dim; ++k)
            for (unsigned int L = 0; L < dim; ++L)
              for (unsigned int K = 0; K < dim; ++K)
                tmp1[I][J][k][L] += get_F()[k][K] * H[I][J][K][L];
      Tensor<4, dim>       HH_mixed;
      const Tensor<2, dim> I_ns = Physics::Elasticity::StandardTensors<dim>::I;
      for (unsigned int i = 0; i < dim; ++i)
        for (unsigned int J = 0; J < dim; ++J)
          for (unsigned int k = 0; k < dim; ++k)
            for (unsigned int L = 0; L < dim; ++L)
              {
                // Push forward index 0 of material stress contribution
                for (unsigned int I = 0; I < dim; ++I)
                  HH_mixed[i][J][k][L] += get_F()[i][I] * tmp1[I][J][k][L];
                // Add geometric stress contribution
                HH_mixed[i][J][k][L] += I_ns[i][k] * S[J][L];
              }
      return HH_mixed;
    }
}

Naturally, the above could be simplified a bit for this fixed parameterisation.

With this, the (matrix-based) assembly would looks something like this for the RHS

// Excuse the messiness -- coding in an email client is not a good idea!for (unsigned int q_point = 0; q_point < this->n_q_points; ++q_point)
{
  const Tensor<2,dim> P = lqph[q_point]->get_P();

  for (unsigned int i = 0; i < this->dofs_per_cell; ++i)
  {     const unsigned int i_group = this->fe.system_to_base_index(i).first.first;
     const Tensor<2,dim> dF_I = fe_values_ref[this->u_fe].gradient(i, q_point);

     if (i_group == this->u_dof)
       data.cell_rhs(i) -= double_contract(dF_I, P) * JxW;
      ... // See the rest of step-44

and the salient part of the tangent matrix assembly would be something like

for (unsigned int q_point = 0; q_point < this->n_q_points; ++q_point)
{  // Linearisation of P with respect to F; 
  // contains both material and geometric tangent contributions
  const Tensor<4,dim> HH = lqph[q_point]->get_HH();

  for (unsigned int i = 0; i < this->dofs_per_cell; ++i)
  {    const unsigned int i_group = this->fe.system_to_base_index(i).first.first;
    const Tensor<2,dim> dF_I = fe_values_ref[this->u_fe].gradient(i, q_point);
    for (unsigned int j = 0; j <= i; ++j)
    {
      const unsigned int j_group = this->fe.system_to_base_index(j).first.first;
const Tensor<2,dim> dF_J = fe_values_ref[this->u_fe].gradient(j, q_point);

      if ((i_group == j_group) && (i_group == this->u_dof))
      {
        cell_matrix(i, j) += scalar_product(dF_I, HH, dF_J) * JxW;
      }
      ... // See the rest of step-44

So IIRC the call to phi.submit_gradient() that is bound to the undeformed/non-mapped DoFHandler is effectively the same as testing against what I've called dF_I in the above code. I think that the transformations from tau->P and Jc->HH should make it simple enough to adjust what remains.

I hope that this helps you implement what you're trying to do.
Jean-Paul

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/7b3c1488-cf75-4526-8442-d1045dc73398n%40googlegroups.com.

David

unread,

Dec 30, 2020, 9:07:47 AM12/30/20

to deal.II User Group

Hi Jean-Paul,

first of all, thank you very much for your comprehensive answer. It's a little bit unfortunate that there is no way for the deformed test gradient evaluation using matrix-free. However, your code snippets and the explanation are traceable and the required changes for the tangent operator seem acceptable (assuming the indexing in the get_HH() function does the right thing, otherwise it's probably not acceptable). As always, I was a little bit too ambitious with my time schedule for this task. I will document your suggestions in an issue and try to implement the rephrased problem in the medium run. In the end, the change is entirely performance motivated, since the residual assembly (without matrx-free) is comparatively expensive considering that it is only performed once per nonlinear iteration.

As an interesting side note: If I remove my wrong 'symm_grad_Nx' contribution from my shared code snippet, the remaining code corresponds already to a rephrased formulation using the Kirchoff stress and deformation gradient (as you have already noticed). Using this residual in combination with the "tau-Jc" tangent from step-44 (one-field adopted) leads to a convergence of the nonlinear scheme. So, it seems the linearization is still "correct enough" for the modified (two-point) frame of the residual. I will -of course- not rely on this half baked solution.

Best regards,
David

Jean-Paul Pelteret

unread,

Dec 30, 2020, 3:27:30 PM12/30/20

to dea...@googlegroups.com

HI David,

You're welcome! I'm glad that you found the explanation useful :-)

the required changes for the tangent operator seem acceptable (assuming the indexing in the get_HH() function does the right thing, otherwise it's probably not acceptable)

The indexing is correct - I've taken that part of the code from some other codes that I've fully verified. But I thought it would still be a useful exercise to re-implement the assembly routine for the other two parameterisations. Maybe this would further convince you of its correctness.

I've attached a quickly modified version of step-44 that does this (see the functions assemble_system_one_cell_tau(), assemble_system_one_cell_S() and assemble_system_one_cell_P()), and uses the aforementioned push/pull operations to transform the stresses and material tangents. (I didn't feel like reimplementing the constitutive laws for the different parameterisations -- I leave that as an exercise to you, if you feel so inclined.) I've also attached some result logs for the default configuration for step-44 plus one additional global refinement step. You'll see that they all exhibit quadratic convergence, with only a slight difference in the convergence history that can probably be attributed to accumulated round-off errors. Nevertheless, for all implementations the absolute residual at the end of each timestep can be pushed down to near machine precision with only a few Newton iterations.

Comparing the P-formulation to the rest, you can see that its actually quite straight forward to implement, so hopefully converting it to matrix free wouldn't pose too much of a challenge. Note that in both the P- and S-formulations, all of the linearisation terms coupled to the displacement also change slightly. The coupling terms derive from the definition of the hydrostatic stress, which is expressed differently for each parameterisation.

As an interesting side note: If I remove my wrong 'symm_grad_Nx' contribution from my shared code snippet, the remaining code corresponds already to a rephrased formulation using the Kirchoff stress and deformation gradient (as you have already noticed). Using this residual in combination with the "tau-Jc" tangent from step-44 (one-field adopted) leads to a convergence of the nonlinear scheme. So, it seems the linearization is still "correct enough" for the modified (two-point) frame of the residual. I will -of course- not rely on this half baked solution.

That's great! To me, this implies that you fixed a bug. Ultimately the RHS that is computed via any of these parameterisations should, in principle, lead to exactly the same result. They are equivalent, and it is sometimes simply convenience (in terms of material laws, numerical efficiency,...) that dictates which you might choose. They are all expressing the weak form of the balance of linear momentum (i.e. the different power conjugate pairs collectively quantify the same mechanical power), and can happily be transformed from one to the other by exploiting the relationships between the various stresses and strains. This implies that the exact linearisation, even if its expressed in terms of some other stress-strain measures, will in fact be the linearisation of any one of these three expressions of the residual associated with displacement DoFs. The linearisation can be similarly computed for one parameterisation and transformed into the others. Its a nice exercise to go though, if you have the time and patience to do so.

Best,
Jean-Paul

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/1062eaa4-0e32-43f8-8fc4-70dfe2638a81n%40googlegroups.com.

step-44.cc

step_44_param_tau.txt

parameters.prm

step_44_param_P.txt

step_44_param_S.txt

David

unread,

Dec 31, 2020, 6:09:37 AM12/31/20

to deal.II User Group

Hi Jean-Paul,

> The indexing is correct - I've taken that part of the code from some other codes that I've fully verified. But I thought it would still be a useful exercise to re-implement the assembly routine for the other two parameterisations. Maybe this would further convince you of its correctness.

Convinced ;)

> I've attached a quickly modified version of step-44 that does this (see the functions assemble_system_one_cell_tau(), assemble_system_one_cell_S() and assemble_system_one_cell_P()), and uses the aforementioned push/pull operations to transform the stresses and material tangents. (I didn't feel like reimplementing the constitutive laws for the different parameterisations -- I leave that as an exercise to you, if you feel so inclined.) I've also attached some result logs for the default configuration for step-44 plus one additional global refinement step.

ah very nice, thanks again.

> That's great! To me, this implies that you fixed a bug. Ultimately the RHS that is computed via any of these parameterisations should, in principle, lead to exactly the same result. They are equivalent, and it is sometimes simply convenience (in terms of material laws, numerical efficiency,...) that dictates which you might choose. They are all expressing the weak form of the balance of linear momentum (i.e. the different power conjugate pairs collectively quantify the same mechanical power), and can happily be transformed from one to the other by exploiting the relationships between the various stresses and strains. This implies that the exact linearisation, even if its expressed in terms of some other stress-strain measures, will in fact be the linearisation of any one of these three expressions of the residual associated with displacement DoFs. The linearisation can be similarly computed for one parameterisation and transformed into the others. Its a nice exercise to go though, if you have the time and patience to do so.

Ah ok, I was not aware of that! In your first answer, you wrote "So you may as well start off by parameterising the problem in terms of F and P, and naturally you have to adjust the linearisation as well." I thought I need to adjust the linearization in any case, when I start to rephrase the residual assembly, but I guess you talked about an entirely rephrased problem (i.e., residual and linearization), which requires an adaption of the linearization. I checked now the convergence behavior of my two-point residual assembly in combination with the spatial linearization and I observe indeed quadratic convergence (though there are some deviations depending on the particular iteration). Rephrasing the linearization is therefore as an exercise left for the next pandemic.

One last point I don't understand right now is the magnitude of the machine precision you talk about:

> You'll see that they all exhibit quadratic convergence, with only a slight difference in the convergence history that can probably be attributed to accumulated round-off errors. Nevertheless, for all implementations the absolute residual at the end of each timestep can be pushed down to near machine precision with only a few Newton iterations.

The absolute errors at the end of each iteration are of the order \mathcal{O}(10^{-10}). Also, if I compare my two-point residual assembly using matrix-free with the 'usual' spatial residual assembly and compute the l2 norm of the difference, i.e. l2_norm(r_mf - r-_tau), I obtain a difference of the order \mathcal{O}(10^{-10}). But from what I learnt, double precision should be at least accurate up to ~10^{-15}. There are five orders of magnitude in between. I probably miss something here in between. Any idea?

Regards,
David

luca.heltai

unread,

Dec 31, 2020, 7:09:24 AM12/31/20

to Deal.II Users

> On 31 Dec 2020, at 12:09, 'David' via deal.II User Group <dea...@googlegroups.com> wrote:
>
>
> The absolute errors at the end of each iteration are of the order \mathcal{O}(10^{-10}). Also, if I compare my two-point residual assembly using matrix-free with the 'usual' spatial residual assembly and compute the l2 norm of the difference, i.e. l2_norm(r_mf - r-_tau), I obtain a difference of the order \mathcal{O}(10^{-10}). But from what I learnt, double precision should be at least accurate up to ~10^{-15}. There are five orders of magnitude in between. I probably miss something here in between. Any idea?

Well, you have to keep in mind that there are two more ingredients to keep into account:

1. the Jacobian condition number (easily in the order of 10^3 — 10^6)
2. the number of operations that lead to the residuals (i.e., how many sums your are eventually doing)

since errors accumulate, 10^{-10} is actually close to machine precision if you have around 10^4/10^5 dofs.

Moreover, you are computing the l2_norm, i.e., sum over all dofs of the *square* of the difference, and then you are taking the square root.

If you make 1e-15 error et each step in the square difference, propagate that for ndofs sums, simply by taking the square root of the resulting error, you’d get an expected error close to 1^{-8}, so you’re results are essentially machine precision.

L.

Jean-Paul Pelteret

unread,

Dec 31, 2020, 7:45:01 AM12/31/20

to dea...@googlegroups.com

Hi David,

One last point I don't understand right now is the magnitude of the machine precision you talk about:

> You'll see that they all exhibit quadratic convergence, with only a slight difference in the convergence history that can probably be attributed to accumulated round-off errors. Nevertheless, for all implementations the absolute residual at the end of each timestep can be pushed down to near machine precision with only a few Newton iterations.

I was just indicating that if you take sufficient Newton steps then the out-of-balance forces are of the order of machine precision, and its unlikely that the solution could be improved by any appreciable margin. Here's what happens when 4 Newton steps are used in the first timestep with the "tau" parameterisation:

Timestep 1 @ 0.1s
*** Using tau=tau(b) parameterisation ***
...
4 --- ASM_SYS CONVERGED!
...
Absolute errors:
Displacement:    1.447e-09
Force:         1.018e-09

In this case, we've hit the early-exit criteria for the Newton scheme after 4 iterations. But for the "P" parameterisation, we miss this criterion by a small margin and take one more step.

Timestep 1 @ 0.1s
*** Using P=P(F) parameterisation ***
...
5 --- ASM_SYS CONVERGED!
...
Absolute errors:
Displacement:    1.390e-12
Force:         8.075e-15

After that, the force residual (i.e. the norm of the "u" component of the RHS vector) is much smaller. But the actual quality of the solution, or the resulting stress field, probably haven't been improved by any significant margin (at least, not anything that impacts the scenario being simulated).

The absolute errors at the end of each iteration are of the order \mathcal{O}(10^{-10}). Also, if I compare my two-point residual assembly using matrix-free with the 'usual' spatial residual assembly and compute the l2 norm of the difference, i.e. l2_norm(r_mf - r-_tau), I obtain a difference of the order \mathcal{O}(10^{-10}). But from what I learnt, double precision should be at least accurate up to ~10^{-15}. There are five orders of magnitude in between. I probably miss something here in between. Any idea?

I understand that your description implies that you're comparing the residuals for a timestep where both schemes have taken the same number of Newton iterations. But nevertheless, I'm not quite sure that there's any value in comparing these two quantities. They are both discrete residuals for the approximate solution, and should both converge to zero norm. So neither one is actually the right reference value -- zero is. I'd explain this difference by the fact that both schemes use totally different code paths, so this is not an apples-to-apples comparison. The evolution of the solution for the MF and MB solvers might be such that the DoFs where the (very small) residuals remain at the end of each timestep differ. I'm not sure that one can say that this is actually means anything -- does it really matter that the displacement at one vertex differs by minuscule amount? Probably not. But even a small shift in the solution changes "where" the error in the residual resides, and your comparison is measuring that.

Best,
Jean-Paul

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/e0ccf5a8-636d-4f13-95aa-16810da97f0dn%40googlegroups.com.

David

unread,

Jan 1, 2021, 7:27:53 AM1/1/21

to deal.II User Group

> I understand that your description implies that you're comparing the residuals for a timestep where both schemes have taken the same number of Newton iterations. But nevertheless, I'm not quite sure that there's any value in comparing these two quantities. They are both discrete residuals for the approximate solution, and should both converge to zero norm. So neither one is actually the right reference value -- zero is. I'd explain this difference by the fact that both schemes use totally different code paths, so this is not an apples-to-apples comparison. The evolution of the solution for the MF and MB solvers might be such that the DoFs where the (very small) residuals remain at the end of each timestep differ. I'm not sure that one can say that this is actually means anything -- does it really matter that the displacement at one vertex differs by minuscule amount? Probably not. But even a small shift in the solution changes "where" the error in the residual resides, and your comparison is measuring that.

If I disable the early exit criterion completely and compare the results of the matrix-free assembly options (spatial and mixed formulation) with the 'usual' FEValues method I still have a difference (see the attached log file, the general setup might suffer from locking effects due to the one-field formulation and linear shape functions). In the end, the error is always higher in case the matrix-free module is used. I totally agree with you when you say the differences don't have any relevant impact on the final result and I guess Luca's comment is relevant at this point, since the computations are performed differently in case of matrix-free. I was initially just a bit confused by this fact.

I don't know if you/how to resolve the threads here, but I think the topic is resolved.. Thank you very much Jean-Paul and Luca for the helpful comments and the shared code.