Exploring the Adversarial Robustness of Language Models

Speaker: Muchao Ye

Date: Feb 19, 11:45am–12:45pm

Abstract: Language models built by deep neural networks have achieved great success in various areas of artificial intelligence, which have played an increasingly vital role in profound applications including chatbots and smart healthcare. However, since deep neural networks are vulnerable to adversarial examples, there are still concerns about applying them to safety-critical tasks. In this talk, I will present a series of methods to evaluate and certify their adversarial robustness for building robust language models, which is the key to solving this conundrum.

Firstly, I will introduce a new idea of conducting text adversarial attacks for evaluating the adversarial robustness of language models in the most realistic hard-label setting, which is to incorporate a pre-trained word embedding space as an optimization intermediate. The proposed gradient-based optimization methods based on that idea successfully tackle the inefficiency problem of existing ones. A deep dive into such a viewpoint further shows that utilizing an estimated decision boundary helps improve the quality of crafted adversarial examples. Secondly, I will further discuss a unified certified robust training framework for enhancing the certified robustness of language models. It provides a stronger robustness guarantee by removing unnecessary modules and harnessing a novel decoupled regularization loss. Finally, I will conclude my talk with an outlook on improving the adversarial robustness of multi-modal foundation models, applying them to healthcare for communication disorders, and building a secure learning paradigm for AI agents.

Biographical Sketch: Muchao Ye is a Ph.D. candidate in the College of Information Sciences and Technology at the Pennsylvania State University, advised by Dr. Fenglong Ma. Before that, he obtained his Bachelor of Engineering degree in Information Engineering at South China University of Technology. His research interests lie in the intersection of AI, security, and healthcare, with a focus on improving AI safety from the perspective of adversarial robustness. His research works have been published in top venues including NeurIPS, KDD, AAAI, ACL, and the Web Conference.

Location and Zoom link: 307 Love, or https://fsu.zoom.us/j/3195217545

Exploring the Adversarial Robustness of Language Models

Department of Computer Science

Program Contacts

Connect with the Department

Search FSU

Sidebar

Department of Computer Science

Program Contacts

Connect with the Department