Daily TMLR digest for Jun 23, 2023

0 views

Skip to first unread message

TMLR

unread,

Jun 22, 2023, 8:00:09 PM6/22/23

to tmlr-anno...@googlegroups.com

Accepted papers
===============

Title: Training with Mixed-Precision Floating-Point Assignments

Authors: Wonyeol Lee, Rahul Sharma, Alex Aiken

Abstract: When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment—a mapping from all tensors (arising in training) to precision levels (high or low)—that keeps most of the tensors in low precision and leads to sufficiently accurate models. We provide a technique that explores this memory-accuracy tradeoff by generating precision assignments for convolutional neural networks that (i) use less memory and (ii) lead to more accurate convolutional networks at the same time, compared to the precision assignments considered by prior work in low-precision floating-point training. We evaluate our technique on image classiﬁcation tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet. Our method typically provides > 2× memory reduction over a baseline precision assignment while preserving training accuracy, and gives further reductions by trading off accuracy. Compared to other baselines which sometimes cause training to diverge, our method provides similar or better memory reduction while avoiding divergence.

URL: https://openreview.net/forum?id=ZoXi7n54OB

---

New submissions
===============

Title: Multiscale Causal Structure Learning

Abstract: Causal structure learning methods are vital for unveiling causal relationships embedded into observed data. However, the state of the art suffers a major limitation: it assumes that causal interactions occur only at the frequency at which data is observed. To address this limitation, this paper proposes a method that allows structural learning of linear causal relationships occurring at different time scales. Specifically, we explicitly take into account instantaneous and lagged inter-relations between multiple time series, represented at different scales, hinging on wavelet transform. We cast the problem as the learning of a multiscale causal graph having sparse structure and dagness constraints, enforcing causality through directed and acyclic topology. To solve the resulting (non-convex) formulation, we propose an algorithm termed MS-CASTLE, which exhibits consistent performance across different noise distributions and wavelet choices. We also propose a single-scale version of our algorithm, SS-CASTLE, which outperforms existing methods in computational efficiency, performance, and robustness on synthetic data. Finally, we apply the proposed approach to learn the multiscale causal structure of the risk of 15 global equity markets, during covid-19 pandemic, illustrating the importance of multiscale analysis to reveal useful interactions at different time resolutions. Financial investors can leverage our approach to manage risk within equity portfolios from a causal perspective, tailored to their investment horizon.

URL: https://openreview.net/forum?id=Ub6XILEF9x

---

Title: Training Vision-Language Transformers from Captions

Abstract: Vision-Language Transformers can be learned without low-level human labels (e.g. class labels, bounding boxes, etc). Existing work, whether explicitly utilizing bounding boxes (Chen et al., 2020b; Tan & Bansal, 2019; Lu et al., 2019) or patches (Kim et al., 2021), assumes that the visual backbone must first be trained on ImageNet (Russakovsky et al., 2015) class prediction before being integrated into a multimodal linguistic pipeline. We show that this is not necessary and introduce a new model Vision-Language from Captions (VLC) built on top of Masked Auto-Encoders (He et al., 2022) that does not require this supervision. In fact, in a head-to-head comparison between ViLT, a strong patch-based vision-language transformer which is pretrained with supervised object classification, and our model, VLC, we find that our approach 1. outperforms ViLT on standard benchmarks, 2. provides more interpretable and intuitive patch visualizations, and 3. is competitive with many larger models that utilize ROIs trained on annotated bounding-boxes.

URL: https://openreview.net/forum?id=xLnbSpozWS

---

Reply all

Reply to author

Forward

0 new messages