I don't know if it is okay to post this here, but we got some exciting results from our model trained on the AVA dataset and I wanted to share it with the community. Since the atomic actions are very transferable between datasets/video settings, models trained on AVA dataset can be highly useful compared to previous large scale action datasets. We implemented a pipeline combining object detectors, trackers and our action detection model as explained in
Actor Conditioned Attention Maps for Video Action Detection. After doing some tricks to make it run faster, we were able to achieve near real-time performance around 16 fps on a single 1080Ti using a webcam. Demo running on a webcam is available
here and we have also made the demo code and model weights available to the community in
GitHub.
Thanks for releasing this dataset to the public.
Oytun Ulutan.