SIGSEC talk: 30 Nov - Evolving Perceptions and Mitigation of Toxicity in Language Models

29 views

Skip to first unread message

ACL SIGSEC

unread,

Nov 3, 2023, 10:03:59 PM11/3/23

to ACL SIGSEC

Evolving Perceptions and Mitigation of Toxicity in Language Models
2023, November 30th, 10.00 ET / 16.00 CET (30 mins + questions)

Beyza Ermis

This two-part talk delves into the dynamic nature of toxicity perception and mitigation in automated systems. The first segment of the talk examines how the evolving standards of what constitutes 'toxic' content, influenced by cultural and geographic diversity, impacts the reproducibility of research findings in toxicity detection models. By re-evaluating widely recognized benchmark models from the HELM project with the latest version of a commercial toxicity detection API, we uncover shifts in model rankings, challenging prior comparative studies. These findings underscore the need for caution in direct comparisons and advocate for a structured, time-conscious framework in assessing toxicity detection models.

The second segment introduces a novel, retrieval-based methodology for toxicity mitigation in text generation models. This method represents a significant stride forward, not only matching the mitigation effectiveness of state-of-the-art models but also emphasizing efficiency. This approach is designed to adapt to the fluid nature of language, offering a more sustainable solution that accommodates the continuous evolution of language use in real-world scenarios.

Zoom link: https://itucph.zoom.us/j/3319000227

Calendar link at https://sig.llmsecurity.net/talks/

See you there!

Reply all

Reply to author

Forward

0 new messages