Workshop website: https://hilworkshops.github.io/hil-dc2022/
Although data quality is a long-standing and enduring problem, it has recently received a resurgence of attention due to the fast proliferation of data analytics, machine learning, and decision-support applications built upon the wide-scale availability and accessibility of (big) data. The success of such applications heavily relies on not only the quantity, but also the quality of data. Data curation, which may include ingestion, annotation, cleaning, integration, etc., is a critical step to provide adequate assurances on the quality of analytics and machine learning results. Such data preparation activities are recognised as time and resource intensive for data scientists as data often comes with a number of challenges that need to be tackled before it can be used in practice. Data re-purposing and the resulting distance between design and use intentions of the data, is a fundamental issue behind many of these challenges. These challenges include a variety of data issues such as noise and outliers, incompleteness, representativeness or biases, heterogeneity of format or semantics, etc. Mishandling these challenges can lead to negative and sometimes damaging effects, especially in critical domains like healthcare, transport, and finance. An observable distinct feature of data quality in these contexts is the increasingly important role played by humans, being often the source of data generation and the active players in data curation. This workshop will provide an opportunity to explore the interdisciplinary overlap between manual, automated, and hybrid human-machine methods of data curation.
This full-day workshop on Oct 21, 2022 will include the following three parts:
Part 1 features plenary sessions, including the keynotes, invited talks, and panel.
Part 2 features selected presentations from speakers whose papers are peer-reviewed and who attend in person.
Part 3 features lightning talks including research/tool/demo presentations from online presenters for extended abstracts that are not formally peer-reviewed.
In this call, we invite submissions of extended abstracts for Part 3 - lightning talks - around the following topics:
Quality control for crowdsourced data curation
Data worker incentivization and engagement, including techniques from citizen science and collective intelligence
Expertise finding and engagement for data curation
Supporting crowd workers and experts in data task completion
Supporting data curation task design for data requesters
Collaborative data work among humans and between humans and AI
Human studies into the transparency, reliability, and biases in manual and hybrid data curation
Interaction techniques for manual, collaborative, and hybrid human-machine data curation, e.g., conversational interfaces
Database and machine learning techniques for supporting large-scale and hybrid data curation
Human intervention in data cascades and machine learning lifecycle management
Benchmarks in machine learning, AI, and related areas
Privacy and security issues of data quality, e.g., data poisoning attacks
Extended abstracts can be work-in-progress research, previously published work, tools and demo, or description of a vision.
Submissions of extended abstracts must be in English, in PDF format, and be at most 1-2 pages in the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).
Submissions do not need to be anonymous but should be submitted electronically via e-mail to dema...@acm.org by 7 Oct 2022 (23.59 AoE). Authors of submitted extended abstracts will be notified on whether their submission will be invited to present a lightning talk at the workshop by 14 Oct 2022. The workshop will run on 21 Oct 2022. Accepted submissions will not be included in the workshop proceedings.
At least one author of each accepted abstract for part 3 of the workshop is required to register for, and present the work at the workshop, in person or online.