Question about how to download .MOV files with the 3D boxes overlaying them

132 views
Skip to first unread message

Rachel Woody

unread,
Jan 3, 2021, 8:20:32 PM1/3/21
to Objectron
Hi objectron experts, 

I am a research assistant in the Ivry lab at UC Berkeley. We would like to utilize the objectron database to conduct a unique neuropsychological experiment to shed light on interactions between motor processes and cognition. We hope our work can provide key insights into movement disorders like Parkinson's disease and ataxia. 

I have been struggling to download and manipulate the objectron database using your tutorials. I managed to use the gsutil method from the tutorial to download several plain .MOV files. However, what I really want to do is download each .MOV file with its 3D annotation/box overlaying it. I then want to insert these "annotated" .MOV files into an excel spreadsheet. Is this possible? 

Our goal is to use these images in an experiment with human participants. The first step for us is just to figure out how to download your annotated data and display it. I would greatly appreciate any advice you can provide to help me achieve this! 
Please let me know if I can clarify anything further!

Thank you so much!

Rachel






Rachel Woody

unread,
Jan 3, 2021, 8:31:22 PM1/3/21
to Objectron
To add to my earlier question, I'm also wondering if there is a way to access any information about each file (i.e. things like rotation rate, degree of rotation, or the size and shape of the object).

Adel Ahmadyan

unread,
Jan 4, 2021, 5:09:44 PM1/4/21
to Rachel Woody, Objectron
1. You can use the draw_annotation_on_image function to draw the 3D bounding boxes on each image. You can use ffmpeg utility to extract different frames from the MOV file (ffmpeg -i video.MOV -vsync vfr images/%05d.png) then use the above function to draw the bounding boxes. There are a few examples of using that function in our tutorials.

2. There are a few utilities out there that you can use to visualize protocol buffers. Alternatively, you can print the .DebugString() on each protobuf to get the detailed string version of the protobuf.

Best,
Adel

--
You received this message because you are subscribed to the Google Groups "Objectron" group.
To unsubscribe from this group and stop receiving emails from it, send an email to objectron+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/objectron/1fc318e9-f239-4a00-a90d-10dfafada64bn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rachel Woody

unread,
Jan 11, 2021, 5:43:40 PM1/11/21
to Objectron
Dear Adel,

Thank you so much, this is extremely helpful and I have made so much progress! 

Actually, I have another question for you! I am also wondering if there is any way to access data such as how much the camera rotates from start to end in each video?

 Essentially, we want to implement an experiment to investigate the relationship between spatial reasoning and arithmetic abilities. Our experiment would present a series of your .MOV files to the participants. For each .MOV, we will ask them to estimate the degree of rotation shown (i.e. 15 degrees). We will then provide them feedback (too high, too low), and observe how they are able to adapt their perception over time. We believe participants who complete such a task will actually perform better on subsequent arithmetic problems.

It seems to me that this rotation data must exist somewhere in order to help formulate the annotations over the videos.  I would greatly appreciate your advice on whether "degree of camera rotation" data for each video file exists, and how I would go about accessing it if there is? 

Thank you again, your dataset has truly inspired us at the Cognition and Action Lab!

Rachel

Adel Ahmadyan

unread,
Jan 12, 2021, 2:00:32 PM1/12/21
to Objectron
That is an interesting project.

Each frame in the dataset has a data.camera.transform, which is a 4x4 matrix. The TopLeftCorner is a 3x3 rotation matrix. (See this tutorial)
So you can write 

transform = np.array(data.camera.transform).reshape(4, 4)
rotation = transform[:2, :2]

To get the relative rotation matrix between two views: multiple rotation_1 @ inv(rotation_2)
If you want angles, you can convert the rotation matrix to the euler angles.

Rachel Woody

unread,
Jan 26, 2021, 12:47:45 PM1/26/21
to Objectron
Dear Adel,

Thank you so much for you help!  

I worked on your suggestions, and made some progress, but I'm still unsure how to get the relative rotation matrix using this line: "multiple rotation_1 @ inv(rotation_2)"  When I tried implementing it exactly as you suggested, it gave me an "invalid syntax" error. Can you offer any advice on the syntax/incorporating this?

Also, is .as_euler() the correct command to convert from the rotation matrix to the euler angles? I am referencing the following documentation regarding Scipy Rotation class:
Screen Shot 2021-01-26 at 9.43.01 AM.png

Thanks again!

Best,

Rachel Woody

Adel Ahmadyan

unread,
Jan 26, 2021, 1:47:00 PM1/26/21
to Rachel Woody, Objectron
Convert them to numpy array. np.asarray(rotation)
you can use np.matmul(a, b) or a@b for matrix multiplication (i.e. "add" two matrix)
You can use np.lingalg.inv(a) for inversion (i.e. "convert" matrix x to -x),

So when I want to compute relative matrix (say x-y) I have to convert y to -y (i.e. np.lingalg.inv(y)) and then add it using matmul.
a @ np.linalg.inv(b)

Rachel Woody

unread,
Jan 27, 2021, 1:17:56 AM1/27/21
to Objectron
Thank you!  Just to make sure I'm putting everything together correctly, would it look something like:

 transform = np.array(data.camera.transform).reshape(4, 4)
rotation_1 = transform[:2, :2]
 np.asarray(rotation_1)
rotation_2 = np.lingalg.inv(rotation_1)
relative_matrix = np.matmul(rotation_1, rotation_2)

To reiterate, I am trying to find the degree of rotation that occurs between the first frame of the video and the last on one axis. I'm pretty sure I'm misunderstanding your instructions to some extent, partly because I am not familiar with linear algebra. Please let me know if you spot mistakes!

Thank you so much,

Rachel

Adel Ahmadyan

unread,
Jan 27, 2021, 2:02:53 AM1/27/21
to Rachel Woody, Objectron
What you wrote yields identity (essentially you wrote x @ x^{-1} = Identity)
Grab the first rotation matrix from the first frame and the second rotation matrix from the last frame.
Robotics, Vision and Control is a very good reference (with tons of examples) to get you started with coordinate systems.

Rachel Woody

unread,
Feb 2, 2021, 1:20:00 AM2/2/21
to Objectron
Hi Adel,

Thank you for the recommendation! 

I'm afraid my mentor and I are still a bit lost on writing the code to find the degree of rotation between two specific frames. Mainly, I am not sure how to access the individual frames that I want in the context of this tutorial. Is there any time you would be willing to meet with me briefly over a video call to see if we can get this up and running? 

Either way, we are so grateful for all of your support and your time!

Best,

Rachel Woody

Adel Ahmadyan

unread,
Feb 3, 2021, 2:20:18 PM2/3/21
to Objectron
So I created a minimal example based on the parse annotation tutorial. I'm using the sample chair-11/0.
Say I want to compute the relative rotation between the frame 0 and 100:
chair_rot.png

Eye-balling this, you can see the user rotated around the yaw axis for 30'. In Objectron, the Yaw axis is aligned with Gravity and it is the Y vector in rotation.


from scipy.spatial.transform import Rotation as R

with open(annotation_file, 'rb') as pb:
    sequence = annotation_protocol.Sequence()
    sequence.ParseFromString(pb.read())
    frame = grab_frame(video_filename, [frame_id])
    
    src_frame = sequence.frame_annotations[0]
    src_transformation = np.array(src_frame.camera.transform).reshape(4, 4)
    dst_frame = sequence.frame_annotations[100]
    dst_transformation = np.array(dst_frame.camera.transform).reshape(4, 4)
    
    relative_transformation = dst_transformation @ np.linalg.inv(src_transformation)
    relative_rotation = R.from_matrix(relative_transformation[:3, :3])
    print(np.degrees(relative_rotation.as_rotvec()))

Which will print [ 4.68138936 -28.10604787 0.40092367]as the result. So the Y value is 28', as expected.


Adel

Rachel Woody

unread,
Feb 28, 2021, 4:38:07 PM2/28/21
to Objectron
Adel,

I hope you are doing well! This tutorial is incredible, I cannot thank you enough! 

I have one clarification question. Is the printed array in the z-y-x format of [roll (z), yaw (y), pitch (x)] in this case, or am I mistaken?  Initially, I assumed the values were given in the usual x-y-z order:  [pitch (x), yaw (y), roll (z)]. However, based on the angles I am getting for each video, it seems as though they might be ordered [z, y, x] instead.
 For example, when I print the rotation vector between frames 0 and 245 (the last frame) of the batch1-0 cereal box video, , I get [ -4.82,  27.22, -22.99], which makes me think pitch is the third value in the array, based on this yaw diagram.

Thank you again for your continued and very generous support!

Best,

Rachel







Adel Ahmadyan

unread,
Mar 3, 2021, 3:02:35 PM3/3/21
to Rachel Woody, Objectron
We use Yaw-Pitch-Roll. We first rotate around yaw (Y in our internal system), then around pitch (Z in our system), and finally around roll (X in our system). It is YZX. If you want a different system, you can convert the rotation matrix to the euler angles you like.



--
You received this message because you are subscribed to the Google Groups "Objectron" group.
To unsubscribe from this group and stop receiving emails from it, send an email to objectron+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages