Tutorial: NAACL-24: Combating Security and Privacy Issues in the Era of Large Language Models

Instructors

Muhao Chen, Chaowei Xiao, Huan Sun, Lei Li, Leon Derczynski, Anima Anandkumar and Fei Wang.

Date and Time

June 16, 2024.

Goal of Tutorial:

This tutorial seeks to provide a systematic summary of risks and vulnerabilities in security, privacy and copyright aspects of large language models (LLMs), and most recent solutions to address those issues. We will discuss a broad thread of studies that try to answer the following questions: (i) How do we unravel the adversarial threats that attackers may leverage in the training time of LLMs, especially those that may exist in recent paradigms of instruction tuning and RLHF processes? (ii) How do we guard the LLMs against malicious attacks in inference time, such as attacks based on backdoors and jailbreaking? (iii) How do we ensure privacy protection of user information and LLM decisions for Language Model as-a-Service (LMaaS)? (iv) How do we protect the copyright of an LLM? (v) How do we detect and prevent cases where personal or confidential information is leaked during LLM training? (vi) How should we make policies to control against improper usage of LLM-generated content? In addition, will conclude the discussions by outlining emergent challenges in security, privacy and reliability of LLMs that deserve timely investigation by the community.

Introduction

Large Language Models have received wide attention from the society. These models have not only shown promising results across NLP tasks, but also emerged to be the backbone of many intelligent systems for web search, education, healthcare, e-commerce and software development. From the societal impact perspective, LLMs like GPT-4 and Chat-GPT have shown significant potential in supporting decision making in many daily-life tasks.

Despite the success, the increasingly scaled sizes of LLMs, as well as their growing deployments in systems, services and scientific studies, are bringing along more and more emergent issues in security and privacy. On the one hand, since LLMs are more potent of memorizing vast amount of information, they can definitely memorize well any kind of training data that may lead to adverse behaviors, leading to backdoors that may be leveraged by adversaries to control or hack any high-stake systems that are built on top of the LLMs. In this context, LLMs may also memorize personal and confidential information that exist in corpora and the RLHF process, therefore being prone to various privacy risks including membership inference, training data extraction, and jailbreaking attacks. On the other hand, the wide usage and adaption of LLMs also challenge the copyright protection of models and their outputs. For example, while some models restrict commercial uses or restrict derivatives of license, it is hard to ensure that downstream developers finetuning these models will comply with the licenses. It is also hard to identify improper usage of LLM generated outputs especially in scenarios like peer review and lawsuits where model generated content should be strictly controlled. Moreover, while a number of LLMs are deployed as services, privacy protection of information in both user inputs and model decisions represents another challenge, particularly for healthcare and fintech services.

This tutorial presents a comprehensive introduction of frontier research on emergent security and privacy issues in the era of LLMs. In particular, we try to answer the following questions: (i) How do we unravel the adversarial threats in the training time of LLMs, especially those that may exist in recent paradigms of instruction tuning and RLHF processes? (ii) How do we guard the LLMs against malicious attacks in inference time, such as attacks based on backdoors and jailbreaking? (iii) How do we addressing the privacy risks of LLMs, such as ensuring privacy protection of user information and LLM decisions? (iv) How do we protect the copyright of an LLM? (v) How do we detect and prevent cases where personal or confidential information is memorized during LLM training and leaked during inference? (vi) How should we control against improper usage of LLM-generated content?

By addressing these critical questions, we believe it is necessary to present a timely tutorial to comprehensively summarize the new frontiers in security and privacy research in NLP, and point out the emerging challenges that deserve further attention of our community. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use technologies, and how related technologies will realize more responsible usage of LLMs in end-user systems.

Tutorial Outline

Introduction [20 min]
handout

We will begin motivating this topic with a selection of real-world LLM applications that are prone to various kinds of security, privacy and vulnerability issues, and outline the emergent technical challenges we seek to discuss in this tutorial.

Addressing Training-time Threats to LLMs [35 min]
handout

One significant area of security concern for LLMs is their susceptibility during the training phase. Adversaries can exploit this vulnerability by strategically contaminating a small fraction of the training data and lead to the introduction of backdoors or a significant degradation in model performance. We will begin discussing the training-time threats by delving into various attack types including sample-agnostic attacks like word or sensitive-level trigger attacks, sample-dependent attacks such as syntactic, paraphrasing and back translation attacks. Subsequently, encompassing emergent LLM development processes of instruction tuning and RLHF, we will discuss how attackers may capitalize on these processes, injecting tailored instruction-following examples or manipulating ranking scores to purposefully alter the model’s behavior. We will also shed light on the far-reaching consequences of training-time attacks across diverse LLM applications. Moving forward, we will introduce threat mitigation strategies in three pivotal stages: (i) Data Preparation Stage where defenders are equipped with means to sanitize training data, eliminating potential sources of poisoning; (ii) Model Training Stage where defenders can measure and counteract the influence of poisoned data within the training process; (iii) Inference Stage where defenders can detect and eliminate poisoned data given the compromised model.

Mitigating Test-time Threats to LLMs [35 min]
handout

Malicious data existing in the training corpora, task instructions and human feedbacks are likely to cause threats to LLMs before they are deployed as Web services. Due to the limited accessibility of model components in these services, mitigation of such threats are realistically be address through test-time defense or detection. In the meantime, new types of vulnerabilities can also be introduced during test-time through adversarial prompts, instructions and few-shot demonstrations. In this part of tutorial, we will first introduce test-time threats to LLMs through prompt injection, malicious task instructions, jailbreaking attacks, adversarial demonstrations, and training-free backdoor attacks. We will then provide insights on mitigating some of those test-time threats based on techniques including prompt robustness estimation, demonstration-based defense, role-playing prompts and ensemble debiasing. While many issues with the test-time threats still remain unaddressed, we will also provide a discussion about how the community should develop to combat those issues.

Handling Privacy Risks of LLMs [35 min]
handout

Along with LLMs’ impressive performance, there have been increasing concerns about their privacy risks. In this part of the tutorial, we will first discuss several privacy risks related to membership inference attack and training data extraction. Next we will discuss privacy-preserving methods in two categories: (i) data sanitization including techniques to detect and remove personal identifier information, or replace sensitive tokens based on differential privacy (DP); (ii) Privacypreserved training, with a focus on methods using DP for training. At last, we discuss existing methods on balancing between privacy and utility, and reflections on what it means for LLMs to preserve privacy, especially on understanding appropriate contexts for sharing information.

Safeguarding LLM Copyright [35 min]
handout

Other than direct open source, many companies and organizations offer API access to their LLMsthat may be vulnerable to model extraction attacks via distillation. In this context, we will first describe potential model extraction attacks. We will then present watermark techniques to identify distilled LLMs, including those for MLMs and generative LMs. DRW adds a watermark in the form of a cosine signal that is difficult to eliminate into the output of the protected model. He et al. (2022) propose a lexical watermarking method to identify IP infringement caused by extraction attacks, and CATER proposes conditional watermarking by replacing synonyms of some words based on linguistic features. However, both methods are surface-level watermarks which the adversary can easily bypass by randomly replacing synonyms in the output, making it difficult to verify by probing the suspect models. GINSEW randomly groups vocabulary into two and adds a watermark based on a sinusoidal signal. This signal will be carried over to the distilled model and can be easily detected using Fourier transform.

Future Research Directions [30 min]
handout

Enumerating and addressing LLM security and privacy issues is essential to ensure reliable and responsible usage of LLMs in services and downstream systems. However, the community moves at a rapid pace and matching developments in LLM security with formal research and application needs is not trivial. At the end of this tutorial, we outline emergent challenges in this area that deserve timely investigation by the community, including (i) how to protect confidential training data during server-side LLM adaptation, (ii) how to realize selfexplainable defense processes of LLMs, (iii) how to handle private information that has already been captured by LLMs, and (iv) how to document security, privacy, copyright and vulnerability risks to enable more responsible development and deployment of LLMs.

Resources:

  • Tutorial syllabus  ACL Anthology
  • Bibtex:

    @inproceedings{chen-etal-2024-combating,
       title = "Combating Security and Privacy Issues in the Era of Large Language Models",
       author = "Chen, Muhao and Xiao, Chaowei and Sun, Huan and Li, Lei and Derczynski, Leon and Anandkumar, Anima and Wang, Fei",
       booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)",
       year = "2024",
       publisher = "Association for Computational Linguistics",
       url = "https://aclanthology.org/2024.naacl-tutorials.2",
       pages = "8--18"
    }