MaLGa Seminar: Gül Varol "Beyond Action Recognition: Detailed Video Modelings", December 3rd 15:00 room 706/Youtube

1 view

Skip to first unread message

Matteo Santacesaria

unread,

Nov 29, 2021, 11:44:30 AM11/29/21

Apologies for multiple posting.

We are pleased to announce the next MaLGa Seminar Series - Machine Learning and Vision.
This event is part of the Ellis Genoa activities.

Speaker: Gül Varol

Affiliation: École des Ponts ParisTech

Date: Friday, December 3rd, 2021
Time: 15:00 p.m.
Location: room 706, via Dodecaneso 35, Genova, IT

Live streaming will be available at https://www.youtube.com/channel/UCU8upIJcI-BFUdFeYUwjJfg/featured

Title: Beyond Action Recognition: Detailed Video Modeling

Abstract: In this talk, I will present some of our recent works on a variety of tasks in computer vision, in particular focusing on detailed video modeling. Action recognition has been a standard problem in the research community working on videos. However, there is more to learn in videos than a closed set of pre-defined semantic action categories. This talk will cover three different directions towards more detailed understanding of dynamic visual contents. (i) First, we will look at our end-to-end text-to-video retrieval approach that learns to map videos and textual descriptions into a joint space, and see the advantages of joint image and video training using transformers. (ii) Then, we will explore a more fine-grained problem of localising text in sign language videos, using weakly-aligned subtitles in sign language interpretation data, again in conjunction with transformers. (iii) Finally, we will go beyond semantics, and look at 3D reconstruction from video data for recovering detailed hand-object interactions, this time we will discuss the limitations of the learning-based methods due to lack of data, and opt for an optimization-based approach.

Bain et al. “Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval”, ICCV 2021.

Varol et al. “Read and Attend: Temporal Localisation in Sign Language Videos”, CVPR 2021.

Bull et al. “Aligning Subtitles in Sign Language Videos”, ICCV 2021.

Hasson et al. “Towards unconstrained joint hand-object reconstruction from RGB videos”, 3DV 2021.

Bio: Gül Varol is a research faculty at the IMAGINE team of École des Ponts ParisTech. Previously, she was a postdoctoral researcher at the University of Oxford (VGG). She obtained her PhD from the WILLOW team of Inria Paris and École Normale Supérieure (ENS). Her thesis received the ELLIS PhD Award. During her PhD, she spent time at MPI, Adobe, and Google. Her research is focused on human understanding in videos, specifically action recognition, body shape and motion analysis, and sign languages.

Matteo Santacesaria
Assistant Professor
MaLGa - Machine Learning Genoa Center
Department of Mathematics
University of Genoa
Personal Homepage

Reply all

Reply to author

Forward

0 new messages