Hello TrojAI Community,
I came across some new Trojan attack and defense related papers this week that I thought I'd share. Take a look below!
One Sentence Summary: Presents a data-limited TrojanNet detector which uses the internal response from hidden neurons to differentiate Trojan model behavior
One Sentence Summary: Introduces a new form of Trojan Attack, TROJAN-ML, which works on multiple state of the art language models across NLP tasks such as toxic comment classification and text completion