各位
下記の講演会3件をハイブリッドで開催しますので、奮ってご参加ください。
杉本
------
日時:
2月17日(月曜日) 13h30-16h45
会場:
国立情報学研究所19F Room 1902&1903 及び
Zoom link:
https://us02web.zoom.us/j/89003851193?pwd=ZiSceBggwoA0OKtPNOqbqA8os79
wNG.1
Meeting ID: 890 0385 1193
Passcode: 381228
-------------------- 1st talk -----------------------
Speaker: Zuzana Kukelova (Czech Technical University in Prague)
https://cmp.felk.cvut.cz/~kukelova/
Title: A Brief Introduction to Camera Geometry Estimation Solvers
Abstract:
We will briefly introduce the most common camera geometry estimation
problems, including relative and absolute pose problems for calibrated,
uncalibrated, and partially calibrated cameras. Starting with a short
historic overview, we will then discuss the current state-of-the-art for
these problems. This includes highlighting the challenges faced when
aiming for efficient and robust solutions for camera geometry estimation.
-------------------- 2nd talk -----------------------
Speaker: Torsten Sattler (Czech Technical University in Prague)
https://tsattler.github.io/
Title: 3D Reconstruction with Gaussian Splatting
Abstract:
Accurate 3D reconstruction is a core computer vision problem with
many applications, including autonomous robots such as self-driving cars,
cultural heritage documentation, and content creation for the
entertainment industry (movies, games, etc.). Traditionally, 3D
reconstructions have been based on 3D meshes and point clouds.
Recently, learning-based approaches, such as neural radiance fields
(NeRFs) and most recently 3D Gaussian Splatting (3DGS), have become
popular. These representations are learned from images with known
intrinsics and extrinsics and generate (close-to) photorealistic
representations of scenes and objects. Compared to NeRFs, which
can be slow to train and slow to render, 3DGS offers both faster
training and test times. This talk first briefly reviews the original
3DGS formulation before identifying shortcomings and explaining how
to resolve them. In particular, we will discuss (i) how to handle
artifacts in the reconstruction caused by a limited set of training
viewpoints, (ii) how to extend the original formulation for handling
images taken under different conditions (day, night, etc.), and (iii)
how to extract accurate 3D meshes from 3DGS representations by
defining a field on top of the 3D Gaussians used to represent the scene.
In addition, we will briefly mention ongoing efforts to ensure that
benchmark results are comparable and that comparisons are fair.
-------------------- 3rd talk -----------------------
Speaker: Ming-Hsuan Yang (University of California, Merced/Google
DeepMind)
https://faculty.ucmerced.edu/mhyang/
Title: Video Understanding and Generation with Multimodal Foundation Models
Abstract:
Recent advances in vision and language models have significantly
improved visual understanding and generation tasks. In this talk, I will
present our latest research on designing effective tokenizers for
transformers and our efforts to adapt frozen large language models for
diverse vision tasks. These tasks include visual classification, video-text
retrieval, visual captioning, visual question answering, visual grounding,
video generation, stylization, outpainting, and video-to-audio conversion.
If time permits, I will also discuss our recent findings in dynamic 3D
vision.
----------------------------------------------------------------------------------