Hi Baptiste,
Glad that you find this dataset useful!
You're correct that the videos in the dataset are not all at the same FPS. The best way to use this is a bit of an open question, but for all the experiments we did when we released the dataset and our initial benchmark models (
detailed in this paper), we resampled all the video to 20 FPS. However, we did provide labels at the original frame rate of the videos, so experiments and learnings on different ways of normalizing this aspect are welcome.
You mentioned ensuring "sync between audio and video", which should be less of an issue once you've made a decision on how you approach the video FPS difference, since whatever rates you resample the video and audio to, the timestamps should enable accurate sync-ing between the audio and video.
We look forward to seeing more of your work on this! And of course, please reach out with any questions!
Thanks,
Sourish