𝑫𝒂𝒕𝒂 𝑩𝒆𝒇𝒐𝒓𝒆 𝑨𝑰, 𝒃𝒖𝒕 𝑫𝒂𝒕𝒂 𝒏𝒆𝒆𝒅𝒔 𝑨𝑰. Having access to large amounts of well-balanced, and well-labeled visual data is the most important component of any learning-based computer vision system. On the other hand, labeling and gathering privacy-preserving data is often the most time-consuming, expensive, and error-prone step in the development of these systems. While data is arguably the most critical component of these systems, it is also clear that it is not a glamorous job. Most researchers and engineers do not find it rewarding to work on data in either industry and much less in academia. In other words, no one wants to perform data work; everyone wants to work on models!
To address data related issues, we are looking for two postdocs (as well as visiting researchers) to work on Data-Focused Computer Vision techniques and their applications to AR/VR in the human sensing laboratory (
www.humansensing.cs.cmu.edu). These positions will explore recent trends in industry with a focus on data-centric data approaches. The goal of this project is to work on problems such as developing new learning algorithms for generating better training datasets, making algorithms robust, reliable, and safe when learning even from limited/biased datasets, understanding where the models fail, or investigating algorithms for online adaptation in a self-supervised manner. This includes work on (but not limited to):
◾ Use of generative models for data augmentation. This includes generation of synthetic data using both computer graphics and image-synthesis with generative models (i.e., VAE, GANs), as well as techniques for domain transfer.
◾ Adversarial learning for data augmentation, robust training, and imbalance learning.
◾ Novelty and drift detection to identify when more data needs to be labeled.
◾ Data selection techniques (e.g., active learning) and core-set selection for identifying the most valuable examples to label.
◾ Visualization tools for high-dimensional data, exploratory data analysis, and network visualization techniques.
◾ Explainable and Fair AI.
◾ Tools that quantify and accelerate time to source and prepare high quality data
◾ Measurements for out-of-distribution robustness. Improve accuracy on out-of-distribution data.
◾ Continual learning.
We’ve already democratized sharing code, and with the next generation of ML/CV algorithms that focus on data, we’ll soon democratize training production models and accelerate the adoption of computer vision algorithms for AR/VR into our daily lives.