hello, is there anyone who knows how to use the newest-released AVA ActiveSpeaker Dataset?
I know the meaning of each items in the annotation file, but I cannot figure out how to process the videos according to the annotations.
As an example from the annotation:
tghXjom3120,1621.23,0.405556,0.135417,0.648611,0.558333,NOT_SPEAKING,tghXjom3120_1620_1680:3
tghXjom3120,1621.28,0.405556,0.135417,0.65,0.5625,NOT_SPEAKING,tghXjom3120_1620_1680:3
tghXjom3120,1621.31,0.404167,0.135417,0.652778,0.56875,NOT_SPEAKING,tghXjom3120_1620_1680:3
tghXjom3120,1621.35,0.402778,0.135417,0.655556,0.577083,NOT_SPEAKING,tghXjom3120_1620_1680:3
where the second item(1621.23 ... ) is timestamp, but i dont know how to use this information while extracting frames.
If I extract clip frames by 20fps as the paper denotes, the interval should be 0.05s, but in the annotation file, it can be 0.03s,0.04s or 0.05s, so how can I preprocess the video by the annotations? Can anyone give me some advices or provide me with the dataset-parser code?
Thanks a lot!