Hi, Michael. I don't understand the function "_in_front_of_both_cameras" in Chapter4.

67 views
Skip to first unread message

proletaria...@gmail.com

unread,
Dec 1, 2016, 7:07:45 AM12/1/16
to OpenCV with Python Blueprints
def _in_front_of_both_cameras(self, first_points, second_points, rot, trans):
     rot_inv = rot
     for first, second in zip(first_points, second_points):
          first_z = np.dot(rot[0, :] - second[0]*rot[2, :], trans) / np.dot(rot[0, :] - second[0]*rot[2, :], second)
          first_3d_point = np.array([first[0] * first_z, second[0] * first_z, first_z])
          second_3d_point = np.dot(rot.T, first_3d_point) – np.dot(rot.T, trans)
     if first_3d_point[2] < 0 or second_3d_point[2] < 0:
          return False
     return True

Could you please tell me how the method used in this function is derived ? Thanks!!

Michael Beyeler

unread,
Dec 1, 2016, 4:40:46 PM12/1/16
to OpenCV with Python Blueprints
Hi,

I don't know if you have the book, but the process is explained on pages 91 - 93.

The function is being used when we are trying to decompose the essential matrix of a camera into translational and rotational components, `trans` and `rot`, respectively.
However, this decomposition (based on SVD) actually has four possible solutions (using either +/- `u_3` with either matrix `W` or `Wt`), only one of which is the valid camera matrix. The four solutions are due to a rotational ambiguity (called the bas-relief ambiguity) and a discrete mirror ambiguity (called Necker reversal). What we can do to find the only valid solution is to iterate through all four solutions, and choose the one that geometrically corresponds to the situation where a reconstructed points lies in front of both cameras (scenario a in the figure below):



In order to do this, we need to project the 2D image coordinates of all keypoints to homogeneous coordinates. This is where the function _in_front_of_both_cameras() comes in. 
Testing whether any given 3D point, derived from a point correspondence in both images, is in front of both cameras for one of the four possible rotation and translation combinations, is a bit tricky. This is because you initially only have the point's projection in each image but lack the point's depth. Assuming X, X' is a 3d point imaged in the first and second cameras coordinate system respectively, and (ũ,ṽ), (ũ', ṽ') the corresponding projection in normalized image coordinates, in the first and second camera images respectively, we can use a rotation translation pair to estimate the 3D points depth in each camera coordinate system:

where r1 .. r3 are the rows of the rotation matrix R and translation t. This is what the following code is doing to get z (`first_z` in the code) and X (`first_3d_point` in the code):

first_z = np.dot(rot[0, :] - second[0]*rot[2, :], trans) / np.dot(rot[0, :] - second[0]*rot[2, :], second)
first_3d_point
= np.array([first[0] * first_z, second[0] * first_z, first_z])

The second point (X') can then be found by rotating the first point:
second_3d_point = np.dot(rot.T, first_3d_point) np.dot(rot.T, trans)

The two points then lie in front of both cameras if both points have a positive z coordinate, in which case the function returns True (else False).

The source of this transformation is here:

You can find more information on epipolar geometry in Hartley & Zisserman (2004). The relevant transformations are explained in Section 9.6, available here:

Best,
Michael

proletaria...@gmail.com

unread,
Dec 6, 2016, 3:39:05 AM12/6/16
to OpenCV with Python Blueprints
   Thanks, Micheal. I have browsed through the chapter on epipolar geometry of Hartley & Zisserman, and I saw the figure illustrating the four scenario, but how to do the math was not given. It's exactly the estimation of "z" value puzzling me a lot, as you have shown :

   Could you please explain how the expression above is derived ?

Sincerely,
Hawk
Reply all
Reply to author
Forward
0 new messages