I recently tried gwt on a mf6 model and also tried mfusg-transport on a usg model. Both have complex structures - 13 layers with different extents (number of nodes) in each layer. They are different models but simulating kind of the same area. Both groundwater flow models converge easily and are quite stable.
To make it simple, In both transport models, I only simulated the advection and dispersivity of a single contaminant specie, which was applied over some mine void pits as CNC in mf6 and PCB in mfusg-transport over the whole stress periods (99 for the mfusg model and 162 for the mf6 model). ATS has been used for both models. The transport part of the mfusg model runs very fast (less than 1 second for each time step) and I always get the expected result - no negative concentrations and very reasonable distribution. But the GWT model had many hiccups, it failed to converge at sp 27 after updating the length of timesteps for that sp many times. I used very similar settings of ims for gwf and gwt model but reduced the value of dvclose by one order in the gwt model. I then tried to modify the ims package but all went worse, some even stuck from sp 1. I then checked the concentration file and found that values start to get crazy when the model had to reduce the initial timestep to converge. The range of the concentration is from -614613.3034585962 to 652918.5279728356, which is apparently not right.
I don't have much experience in transport modeling. Wish someone who has similar experience can help a little bit.