Tables are a promising modality for representation learning with too much application potential to ignore. However, tables have long been overlooked despite their dominant presence in the data landscape, e.g. data management and analysis pipelines. The majority of datasets in Google Dataset Search, for example, resembles typical tabular file formats like CSVs. Similarly, the top-3 most-used database management systems are all relational (RDBMS). Representation learning over tables, possibly combined with other modalities such as text or SQL, has shown impressive performance for tasks like semantic parsing, question answering, table understanding, and data preparation. More recently, the pre-training paradigm was shown to be effective for tabular ML as well, while researchers also started exploring the impressive capabilities of LLMs for table encoding and data manipulation.
The Table Representation Learning workshop is the first in this emerging research area and concentrates on three main goals:
(1) Motivate tables as a primary modality for representation and generative learning and advance the area further.
(2) Showcase impactful applications of pretrained table models and discussing future opportunities.
(3) Foster discussion and collaboration across the ML, NLP and DB communities.
ScopeWe invite submissions on any of, or related to, the following topics on machine learning for tabular data:
- Representation Learning over tables which can be structured as well as semi-structured, and extend to full databases. Example contributions are new model architectures, data encoding techniques, pre-training, fine-tuning, and prompting strategies, multi-task learning, etc.
- Generative Learning and LLMs for structured data and interfaces to structured data (e.g. queries, analysis).
- Multimodal learning where tables are jointly embedded with, for example, natural language, code (e.g. SQL), knowledge bases, visualizations/images.
- Downstream Applications of table representations for tasks like data preparation (e.g. data cleaning, validation, integration, cataloging, feature engineering), retrieval (e.g. search, fact-checking/QA, KG construction), analysis (e.g. summarization, visualization, and query recommendation), and (end-to-end) machine learning.
- Upstream Applications of table representation models for optimizing table parsers/extraction (from documents, spreadsheets, presentations), storage (e.g. compression, indexing) and query processing e.g. query optimization
- Production challenges of table representation models. Work addressing the challenges of maintaining and managing TRL models in fast evolving contexts, e.g. data updating, error correction, monitoring.
- Domain-specific challenges for learned table models often arise in domains such as enterprise, finance, medical, law. These challenges pertain to table content, table structure, privacy, security limitations, and other factors that necessitate tailored solutions.
- Benchmarks and analyses of table representation models, including the utility of language models as base models versus alternatives and robustness regarding large, messy, heterogeneous, or complex tables.
- Others: Formalization, surveys, datasets, visions, and reflections to structure and guide future research.
Important dates- Submission Deadline: October 2, 2023 (15:00 GMT)
- Notifications: October 27, 2023
- Workshop Date: December 15th, 2023
Submission guidelines
The workshop will accept regular research papers and industrial papers. Submissions be anonymized, follow the NeurIPS proceedings format and choose a suitable category of:
- Short paper: 4 pages + references.
- Regular paper: 8 pages + references.
OrganizersMadelon Hulsebos, University of Amsterdam
Haoyu Dong, Microsoft Research
Bojan Karlaš, Harvard
Laurel Orr, Numbers Station AI
Pengcheng Yin, Google DeepMind
Gael Varoquaux, Inria Saclay
Qian Liu, Sea AI Lab
---
We look forward to receiving your submissions and seeing you in New Orleans!
https://table-representation-learning.github.io/