Daily TMLR digest for Jun 04, 2024

1 view

Skip to first unread message

TMLR

unread,

Jun 4, 2024, 12:00:08 AMJun 4

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Semi-Supervised Semantic Segmentation via Marginal Contextual Information

Authors: Moshe Kimhi, Shai Kimhi, Evgenii Zheltonozhskii, Or Litany, Chaim Baskin

Abstract: We present a novel confidence refinement scheme that enhances pseudo-labels in semi-supervised semantic segmentation. Unlike existing methods, which filter pixels with low-confidence predictions in isolation, our approach leverages the spatial correlation of labels in segmentation maps by grouping neighboring pixels and considering their pseudo-labels collectively. With this contextual information, our method, named S4MC, increases the amount of unlabeled data used during training while maintaining the quality of the pseudo-labels, all with negligible computational overhead. Through extensive experiments on standard benchmarks, we demonstrate that S4MC outperforms existing state-of-the-art semi-supervised learning approaches, offering a promising solution for reducing the cost of acquiring dense annotations. For example, S4MC achieves a 1.39 mIoU improvement over the prior art on PASCAL VOC 12 with 366 annotated images. The code to reproduce our experiments is available at https://s4mcontext.github.io/

URL: https://openreview.net/forum?id=i5yKW1pmjW

---

Title: Multimodal Chain-of-Thought Reasoning in Language Models

Authors: Zhuosheng Zhang, Aston Zhang, Mu Li, hai zhao, George Karypis, Alex Smola

Abstract: Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach. With Multimodal-CoT, our model under 1 billion parameters achieves state-of-the-art performance on the ScienceQA benchmark. Our analysis indicates that Multimodal-CoT offers the advantages of mitigating hallucination and enhancing convergence speed. Code is publicly available at https://github.com/amazon-science/mm-cot.

URL: https://openreview.net/forum?id=y1pPWFVfvR

---

Title: Contrastive Graph Autoencoder for Shape-based Polygon Retrieval from Large Geometry Datasets

Authors: Zexian Huang, Kourosh Khoshelham, Martin Tomko

Abstract: Retrieval of polygon geometries with similar shapes from maps is a challenging geographic information task. Existing approaches can not process geometry polygons with complex shapes, (multiple) holes and are sensitive to geometric transformations (e.g., rotations). We propose Contrastive Graph Autoencoder (CGAE), a robust and effective graph representation autoencoder for extracting polygon geometries of similar shapes from real-world building maps based on template queries. By leveraging graph message-passing layers, graph feature augmentation and contrastive learning, the proposed CGAE embeds highly discriminative latent embeddings by reconstructing graph features w.r.t. the graph representations of input polygons, outperforming existing graph-based autoencoders (GAEs) in geometry retrieval of similar polygons. Experimentally, we demonstrate this capability based on template query shapes on real-world datasets and show its high robustness to geometric transformations in contrast to existing GAEs, indicating the strong generalizability and versatility of CGAE, including on complex real-world building footprints.

URL: https://openreview.net/forum?id=9fcZNAmnyh

---

Title: Prototypical Self-Explainable Models Without Re-training

Authors: Srishti Gautam, Ahcene Boubekki, Marina MC Höhne, Michael Kampffmeyer

Abstract: Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs, instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to enhance transparency in deep learning-based decision-making via class-prototype-based explanations that are diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs\footnote{The code is available at https://github.com/SrishtiGautam/KMEx}.

URL: https://openreview.net/forum?id=HU5DOUp6Sa

---

Title: Conservative Prediction via Data-Driven Confidence Minimization

Authors: Caroline Choi, Fahim Tajwar, Yoonho Lee, Huaxiu Yao, Ananya Kumar, Chelsea Finn

Abstract: In safety-critical applications of machine learning, it is often desirable for a model to be \textit{conservative}, abstaining from making predictions on ``unknown'' inputs which are not well-represented in the training data. However, detecting unknown examples is challenging, as it is impossible to anticipate all potential inputs at test time. To address this, prior work minimizes model confidence on an auxiliary outlier dataset carefully curated to be disjoint from the training distribution. We theoretically analyze the choice of auxiliary dataset for confidence minimization, revealing two actionable insights: (1) if the auxiliary set contains unknown examples similar to those seen at test time, confidence minimization leads to provable detection of unknown test examples, and (2) if the first condition is satisfied, it is unnecessary to filter out known examples for out-of-distribution (OOD) detection. Motivated by these guidelines, we propose the Data-Driven Confidence Minimization (DCM) framework, which minimizes confidence on an \textit{uncertainty dataset}. We apply DCM to two problem settings in which conservative prediction is paramount -- selective classification and OOD detection -- and provide a realistic way to gather uncertainty data for each setting. In our experiments, DCM consistently outperforms existing selective classification approaches on 4 datasets when tested on unseen distributions and outperforms state-of-the-art OOD detection methods on 12 ID-OOD dataset pairs, reducing FPR (at TPR $95\%$) by $6.3\%$ and $58.1\%$ on CIFAR-10 and CIFAR-100 compared to Outlier Exposure.

URL: https://openreview.net/forum?id=QPuxjsjKCP

---

New submissions
===============

Title: Variational Inference on the Final-Layer Output of Neural Networks

Abstract: Traditional neural networks are simple to train but they typically produce overconfident predictions. In contrast, Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming due to the large parameter space. This paper proposes to combine the advantages of both approaches by performing Variational Inference in the Final layer Output space (VIFO), because the output space is much smaller than the parameter space. We use neural networks to learn the mean and the variance of the probabilistic output. Using the Bayesian formulation we incorporate collapsed variational inference with VIFO which significantly improves the performance in practice. On the other hand, like standard, non-Beyesian models, VIFO enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that VIFO provides a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.

URL: https://openreview.net/forum?id=mTOzXLmLKr

---

Title: Probabilistic Guarantees for Abductive Inference

Abstract: Abductive reasoning is ubiquitous in artificial intelligence and everyday thinking. However, formal theories that provide probabilistic guarantees for abductive inference are lacking. We present a quantitative formalization of abductive logic that combines Bayesian probability with the interpretation of abduction as a search process within the Algorithmic Search Framework (ASF). By incorporating uncertainty in background knowledge, we establish two novel sets of probabilistic bounds on the success of abduction when (1) selecting the single most likely cause while assuming noiseless observations, and (2) selecting any cause above some probability threshold while accounting for noisy observations. To our knowledge, no existing abductive or general inference bounds account for noisy observations. Furthermore, while most existing abductive frameworks assume exact underlying prior and likelihood distributions, we assume only percentile-based confidence intervals for such values. These milder assumptions result in greater flexibility and applicability of our framework. We also explore additional information-theoretic results from the ASF and provide mathematical justifications for everyday abductive intuitions.

URL: https://openreview.net/forum?id=DtJen9ML0g

---

Reply all

Reply to author

Forward

0 new messages