When Generative AI Gets Hacked: A Comprehensive Classification of Cyberattacks on Large Language Models (LLMs) and Their Mitigation Techniques

Dishita Naik, Ishita Naik, Nitin Naik

Research output: Preprint or Working paperPreprint

Abstract

Large Language Models (LLMs) have swiftly become prevalent in nearly every aspect of human life due to a combination of technological breakthroughs, practical usability, and rapid integration into everyday tools and workflows. Despite their remarkable capabilities, LLMs pose real challenges in their secure and safe development and deployment; and are vulnerable to various cyberattacks that can compromise their behaviour, outputs, security and performance. Understanding these vulnerabilities and potential cyberattacks on LLMs is essential for ensuring their secure and safe development and deployment. Numerous types of cyberattacks can be launched against LLMs, and there is currently no universally accepted classification system for these cyberattacks, as this remains an evolving area of research. This paper will provide a systematic and broad classification of LLM attacks into four major categories based on its four inherent and important components: input prompt, training data, underlying AI model and output; and these four categories of LLM attacks are: Input (Prompt) Related Cyberattacks, Data (Training) Related Cyberattacks, AI Model (Inference) Related Cyberattacks, and Output (Response) Related Cyberattacks. This paper will discuss all four aforementioned categories of cyberattacks on LLMs in detail including various types of cyberattacks in each category. Subsequently, it will discuss several risks associated with cyberattacks on LLMs. Finally, it will discuss several mitigation techniques for cyberattacks on LLMs. A rigorous examination and taxonomy of diverse cyberattacks targeting LLMs, alongside an analysis of associated risks and mitigation strategies, is poised to yield nuanced and actionable understanding regarding the security and safety landscape of LLMs. Through systematic classification and evaluation, such research will advance the field by illuminating various cyberattacks, vulnerabilities, risks, and defensive measures pertinent to LLM-based systems, thereby supporting more robust deployment and governance of these technologies in sensitive environments.
Original languageEnglish
Number of pages30
DOIs
Publication statusPublished - 10 Dec 2025

Keywords

  • Generative AI
  • Large language models
  • LLMs
  • Cyberattacks on LLM
  • Attacks on LLMs

Fingerprint

Dive into the research topics of 'When Generative AI Gets Hacked: A Comprehensive Classification of Cyberattacks on Large Language Models (LLMs) and Their Mitigation Techniques'. Together they form a unique fingerprint.

Cite this