The camera_poses.yaml has the pose of the cameras with respect to the body which they are attached to, in your case "rig". The transform is from the body to the camera, and is given as position and rotation vector (unit length rotation axis multiplied by rotation angle in radians, aka log(Rotation)). Then there is R, which is the inverse of the covariance matrix, i.e. it specifies the certainty with which you know the position.
For the first camera you can pick pretty much any pose with respect to the body, conventionally zero rotation and zero translation. But bear in mind that this transform matters if you also use odometery measurements, since they will also refer to the rig body. The R matrix you are using right now specifies a very tight prior which is ok.
The easiest for the second camera is to not specify any pose at all, and sprinkle a large number of tags around and take a recording. At the end of the run (once you do the "dump" service call), TagSLAM will produce a camera_poses.yaml in the ~/.ros/. It will have an estimate of the pose of both cameras. The pose of the first one should be identical to your input. The second camera should have the extrinsics that you are looking for. Also copy the R matrix over because that gives TagSLAM a realistic estimate of the error in the extrinsic calibration.