I think that your results are more or less explainable.
Using quaternions without local parameterizations:
a. Results into larger dimensionality of tangent space (than in the case of axis-angle) => larger linear solver problems => linear solver time increases [to 0.28 from 0.23]
b. Computing jacobians involves (in the case of bundle adjustment example you're using) automatic differentiation for larger parameter block sizes, thus jacobian evaluation times are slightly higher (compared to axis-angle) [0.314 vs 0.308]
c. Computing residuals becomes cheaper (no trigonometry) [0.042 vs 0.049]
Using quaternions with local parameterizations:
a. Results into the same dimensionality of tangent space (as in the case of axis-angle) => similar size linear solver problems => linear solver time is almost equal [0.227 vs 0.229]
b. Computing jacobians involves automatic differentiation and additional operations for obtaining gradient in tangent space / parameter space => jacobian evaluation times are the highest
c. Computing residuals becomes cheaper (no trigonometry involved) [0.042 vs 0.049]
Note that only quaternions with local parameterizations will guarantee that rotation does not stay constant for non-zero perturbations in every point from parameter space.
If your performance is limited by residual/jacobian evaluation - you might be interested in pre-computing camera-specific jacobian parts (and transformations from the world to camera as matrices) in evaluation callback and computing only point-specific jacobian parts in the cost function evaluation routine; especially if per-point computations are rather simple, for example in the case of calibrated cameras.