Hi Stefan,
I've had this confusion as well. I'll try to answer your questions below.
1. Regarding the conventionsAs you said, GTSAM is built upon Lie group theory. That requires to assume one convention to apply some operations  left or right handed.
GTSAM uses righthand notation: corrections are applied on the right hand, as well as right jacobians are chosen in general. Probability distributions of elements of the group are also defined with "right notation".
With a concrete example, let's say you have a Pose3 (i.e, a rigidbody matrix, an object from SE(3)) that represents the pose of the body in time 1 in the world frame:
w_T_w_b1
Additionally, you have another estimate at time 2:
w_T_w_b2
You additionally have a measurement of the relative pose between 1 and 2 (like, an odometry measurement). Let's be ambiguous for now and call it
dT
With the previous information, the following expression holds.
w_T_w_b2 = w_T_w_b1 * dTIf we make an analogy with a common Kalman filter scheme, it's like a process model:
x(t+1) = x(t) + u(t)
However, the previous equations are deterministic  we're not considering noise nor any distribution. We want things to be Gaussian, so in the Kalman filter case we say that the control u(t) has some white noise
n (i.e, zero mean Gaussian noise, with covariance
sigma):
x(t+1) = x(t) + u(t) +
nSo if you take the previous equation, an leave the noise alone, you can say that the difference distributes as a zero mean Gaussian with covariance
sigma:
x(t+1)  x(t) + u(t) = n
So what you end up having on the left hand side, is basically a vector version of the between factor. You can directly use it in GTSAM using the covariance
sigma as the factor covariance.Now the thing is to make the same when you have poses. We have this equation:
w_T_w_b2 = w_T_w_b1 * dTWe say that the only source of noise is the measurement dT. So we "add" a zero mean Gaussian noise:
w_T_w_b2 = w_T_w_b1 * dT * Exp(N)The new term, Exp(N) is the exponential map of a variable N that distributes as a zero mean Gaussian, but on the
tangent space of the group. It means that it has 6 degrees of freedom (3 position + 3 orientation), and has a covariance matrix Sigma6 (6x6).
Now this new term is the only source of uncertainty. Let's isolate as before to obtain our between factor and analyze what happens:
(dT)^1 * (w_T_w_b1)^1 * w_T_w_b2 = Exp(N)
The inversion (
w_T_w_b1)^1 results in
b1_T_b1_w (we flip the frames)
=> (dT)^1 * b1_T_b1_w * w_T_w_b2 = Exp(N)
The blue term basically defines the difference between frames 1 and 2, defined in frame 1
To make this consistent, (dT)^1 should match the frames, so it should be
dT^1 = b2_dT_b2_b1Hence, dT must be
dT = b1_dT_b1_b2dT describes a transformation defined in b1 that "takes you" to b2  what an odometry measurement is supposed to do.
The whole error (between factor) is finally:
(b1_dT_b1_b2)^1 * b1_T_b1_w * w_T_w_b2 = Exp(N)Which is defined "in b2 to b2". So it must be Exp(N) the error on the right hand side, hence its covariance Sigma. That's why we said the error is defined
on the measurement frame, i.e, b2 in this example.
It is quite critical to keep track of the frames of the transformations and measurements involved when the objects are rigidbody transformations, rotation matrices or quaternions. However, this doesn't happen only in factor graphs, but Kalman filters too, because it depends how you apply the corrections and the noise.
The previous example implicitly adopted a right hand notation when defining:
w_T_w_b2 = w_T_w_b1 * dT * Exp(N)
We could have also expressed:
w_T_w_b2 = w_T_w_b1 * Exp(N) * dT
But that would have implied "noise defined in frame b1" which could make things more involved. GTSAM always assumes that noises are applied from the right hand side.
As a last remark on this side, the expression
dT * Exp(N) is also interpreted as a Gaussian with mean dT and covariance Sigma. This is also a right hand notation to express distributions on SE(3).
2. Regarding how the noise must be defined for each factor, when I'm confused I always check the Logarithm map implementation of each object. Most of the factors that require working with Lie Groups usually have a Logarithm map that converts the matrix difference into a vector difference by mapping the difference to the tangent space.
As we said before, since the tangent space is has the same dimension of the minimum representation of the object (rotations 3, poses 6, etc), the covariance should match their dimension, and the Logarithm map resulting vector should follow the order in which the covariance matrix must be defined.
In particular, what you said is right

Pose2 is (x, x, theta)
Pose 3 is (alpha, beta, gamma, x, y, z) 3. Lastly, I think that handling the covariances is always tricky because we need to be careful with the conventions. While GTSAM uses right hand, others use left hand. For instance,
this pretty nice paper shows you how to compute the covariance of a difference of distributions, composition, inverses and so on, but it uses a
left hand notation. So you must adapt the equations to be consistent with GTSAM for example.
I hope it helps!
Matias