Daily TMLR digest for Jul 04, 2024

1 view
Skip to first unread message

TMLR

unread,
Jul 4, 2024, 12:00:07 AM7/4/24
to tmlr-anno...@googlegroups.com

Accepted papers
===============


Title: PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

Authors: Ruosen Li, Teerth Patel, Xinya Du

Abstract: Nowadays, the quality of responses generated by different modern large language models (LLMs) is hard to evaluate and compare automatically. Recent studies suggest and predominantly use LLMs for reference-free evaluation of open-ended question answering. More specifically, they use the recognized “strongest” LLM as the evaluator, which conducts pairwise comparisons of candidate models’ answers and provides a ranking score. However, this intuitive method has multiple problems, such as bringing in self-enhancement (favoring its own answers) and positional bias. We draw insights and lessons from the educational domain (Cho & MacArthur, 2011; Walsh, 2014) to improve LLM-based evaluations. Specifically, we propose (1) the peer rank (PR) algorithm that takes into account each peer LLM’s pairwise preferences of all answer pairs, and outputs a final ranking of models; and (2) peer discussion (PD), where we prompt two LLMs to discuss and try to reach a mutual agreement on the preferences of two answers. We conduct experiments on two benchmark datasets. We find that our approaches achieve higher accuracy and align better with human judgments. Interestingly, PR can induce a relatively accurate self-ranking of models under the anonymous setting, where each model’s name is unrevealed. Our work provides space to explore evaluating models that are hard to compare for humans.

URL: https://openreview.net/forum?id=YVD1QqWRaj

---


New submissions
===============


Title: Equivariant Symmetry Breaking Sets

Abstract: Equivariant neural networks (ENNs) have been shown to be extremely effective in applications involving underlying symmetries. By construction ENNs cannot produce lower symmetry outputs given a higher symmetry input. However, symmetry breaking occurs in many physical systems and we may obtain a less symmetric stable state from an initial highly symmetric one. Hence, it is imperative that we understand how to systematically break symmetry in ENNs. In this work, we propose a novel symmetry breaking framework that is fully equivariant and is the first which fully addresses spontaneous symmetry breaking. We emphasize that our approach is general and applicable to equivariance under any group. To achieve this, we introduce the idea of symmetry breaking sets (SBS). Rather than redesign existing networks, we design sets of symmetry breaking objects which we feed into our network based on the symmetry of our inputs and outputs. We show there is a natural way to define equivariance on these sets, which gives an additional constraint. Minimizing the size of these sets equates to data efficiency. We prove that minimizing these sets translates to a well studied group theory problem, and tabulate solutions to this problem for the point groups. Finally, we provide some examples of symmetry breaking to demonstrate how our approach works in practice.

URL: https://openreview.net/forum?id=tHKH4DNSR5

---

Title: Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning

Abstract: Deep reinforcement learning has demonstrated remarkable achievements across diverse domains such as video games, robotic control, autonomous driving, and drug discovery. Common methodologies in partially observable domains largely lean on end-to-end learning from high-dimensional observations, such as images, without explicitly reasoning about true state. We suggest an alternative direction, introducing the Partially Supervised Reinforcement Learning (PSRL) framework. At the heart of PSRL is the fusion of both supervised and unsupervised learning. The approach leverages a state estimator to distill supervised semantic state information from high-dimensional observations which are often fully observable at training time. This yields more interpretable policies that compose state predictions with control. In parallel, it captures an unsupervised latent representation. These two—the semantic state and the latent state—are then fused and utilized as inputs to a policy network. This juxtaposition offers practitioners a flexible and dynamic spectrum: from emphasizing supervised state information to integrating richer, latent insights. Extensive experimental results indicate that by merging these dual representations, PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods in terms of reward and convergence speed.

URL: https://openreview.net/forum?id=70s7gEIPmO

---

Title: Flatness-guided hyper-parameter optimization

Abstract: We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the Hessian norm and established sharpness metrics. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the upper bound on the sharpness of the loss. By using the structure of the underlying neural network, we derive semi-empirical estimates for the sharpness of the loss, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.

URL: https://openreview.net/forum?id=IhGliADVth

---

Title: Diverse Diffusion: Enhancing Image Diversity in Text-to- Image Generation

Abstract: Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms.

Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meets the desired distance requirements and the required batch size.

To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans. We also provide image quality assessment by human raters.

The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity without decrease in quality, our approach contributes to the creation of more inclusive and representative AI-generated art.

URL: https://openreview.net/forum?id=UHaSaor8VU

---

Title: Probabilistic Matching of Real and Generated Data Statistics in Generative Adversarial Networks

Abstract: Generative adversarial networks constitute a powerful approach to generative modeling. While generated samples often are indistinguishable from real data, there is no guarantee that they will follow the true data distribution. For scientific applications in particular, it is essential that the true distribution is well captured by the generated distribution. In this work, we propose a method to ensure that the distributions of certain generated data statistics coincide with the respective distributions of the real data. In order to achieve this, we add a new loss term to the generator loss function, which quantifies the difference between these distributions via suitable $f$-divergences. Kernel density estimation is employed to obtain representations of the true distributions, and to estimate the corresponding generated distributions from minibatch values at each iteration. When compared to other methods, our approach has the advantage that the complete shapes of the distributions are taken into account. We evaluate the method on a synthetic dataset and a real-world dataset and demonstrate improved performance of our approach.

URL: https://openreview.net/forum?id=o1oetBJuSv

---

Title: LLM-Select: Feature Selection with Large Language Models

Abstract: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM to output a numerical importance score for a feature (e.g., ``blood pressure'') in predicting an outcome of interest (e.g., ``heart failure''), with no additional context. In particular, we find that the latest models, such as GPT-4, can consistently identify the most predictive features regardless of the query mechanism and across various prompting strategies. We illustrate these findings through extensive experiments on real-world data, where we show that LLM-based feature selection consistently achieves strong performance competitive with data-driven methods such as the LASSO, despite never having looked at the downstream training data. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect \textit{in the first place}. This could potentially benefit practitioners in domains like healthcare, where collecting high-quality data comes at a high cost.

URL: https://openreview.net/forum?id=16f7ea1N3p

---

Title: Bandits with Mean Bounds

Abstract: We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.

URL: https://openreview.net/forum?id=4TZ4DE24fX

---

Title: λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Abstract: Despite the recent advances in personalized text-to-image (P-T2I) generative models, it remains challenging to perform finetuning-free multi-subject-driven T2I in a resource-efficient manner. Predominantly, contemporary approaches, involving the training of hypernetworks and Multimodal Large Language Models (MLLMs), require heavy computing resources that range from 600 to 12300 GPU hours of training. These subject-driven T2I methods hinge on Latent Diffusion Models (LDMs), which facilitate T2I mapping through cross-attention layers. While LDMs offer distinct advantages, P-T2I methods' reliance on the latent space of these diffusion models significantly escalates resource demands, leading to inconsistent results and necessitating numerous iterations for a single desired image.

Through empirical evidences we find that CLIP (vision) latent space is already expressive enough to preserve the fine-grained details. Building upon this insight, in this paper, we present λ-ECLIPSE, an alternative prior-training strategy that works in the latent space of a pre-trained CLIP model without relying on the diffusion UNet models. λ-ECLIPSE leverages the image-text interleaved pre-training for fast and effective multi-subject-driven P-T2I. Through extensive experiments, we establish that λ-ECLIPSE surpasses existing baselines in composition alignment while preserving concept alignment performance, even with significantly lower resource utilization. λ-ECLIPSE performs multi-subject driven P-T2I with just 34M parameters and is trained on a mere 74 GPU hours. Additionally, λ-ECLIPSE demonstrates the unique ability to perform multi-concept interpolations.

URL: https://openreview.net/forum?id=7q5UewlAdM

---

Title: Out-of-Distribution Detection with Domain-Invariant Representations in Multi-domain Latent Space

Abstract: Domain generalization focuses on leveraging the knowledge from the training data of multiple related domains to enhance inference on unseen in-distribution (IN) and out-of-distribution(OOD) domains. In our study, we introduce a multi-task representation learning technique
that leverages the information of multiple related domains to improve the detection of classes from unseen domains. Our method aims to cultivate a latent space from data spanning multiple domains, encompassing both source and cross-domains, to amplify generalization to
OOD domains. Additionally, we attempt to disentangle the latent space by minimizing the mutual information between the input and the latent space, effectively de-correlating spurious correlations among the samples of a specific domain. Collectively, the joint optimization
will facilitate domain-invariant feature learning. We assess the model’s efficacy across multiple cybersecurity datasets, using standard classification metrics on both unseen IN and OOD sets, and validate the results with contemporary domain generalization methods.

URL: https://openreview.net/forum?id=Mb9heaooFI

---

Title: AGG: Amortized Generative 3D Gaussians for Single Image to 3D

Abstract: Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing sampling-based 3D Gaussian frameworks and inference-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster.

URL: https://openreview.net/forum?id=BOq3n5ewSP

---

Reply all
Reply to author
Forward
0 new messages