TY - UNPB
T1 - When Generative AI Gets Hacked: A Comprehensive Classification of Cyberattacks on Large Language Models (LLMs) and Their Mitigation Techniques
AU - Naik, Dishita
AU - Naik, Ishita
AU - Naik, Nitin
PY - 2025/12/10
Y1 - 2025/12/10
N2 - Large Language Models (LLMs) have swiftly become prevalent in nearly every aspect of human life due to a combination of technological breakthroughs, practical usability, and rapid integration into everyday tools and workflows. Despite their remarkable capabilities, LLMs pose real challenges in their secure and safe development and deployment; and are vulnerable to various cyberattacks that can compromise their behaviour, outputs, security and performance. Understanding these vulnerabilities and potential cyberattacks on LLMs is essential for ensuring their secure and safe development and deployment. Numerous types of cyberattacks can be launched against LLMs, and there is currently no universally accepted classification system for these cyberattacks, as this remains an evolving area of research. This paper will provide a systematic and broad classification of LLM attacks into four major categories based on its four inherent and important components: input prompt, training data, underlying AI model and output; and these four categories of LLM attacks are: Input (Prompt) Related Cyberattacks, Data (Training) Related Cyberattacks, AI Model (Inference) Related Cyberattacks, and Output (Response) Related Cyberattacks. This paper will discuss all four aforementioned categories of cyberattacks on LLMs in detail including various types of cyberattacks in each category. Subsequently, it will discuss several risks associated with cyberattacks on LLMs. Finally, it will discuss several mitigation techniques for cyberattacks on LLMs. A rigorous examination and taxonomy of diverse cyberattacks targeting LLMs, alongside an analysis of associated risks and mitigation strategies, is poised to yield nuanced and actionable understanding regarding the security and safety landscape of LLMs. Through systematic classification and evaluation, such research will advance the field by illuminating various cyberattacks, vulnerabilities, risks, and defensive measures pertinent to LLM-based systems, thereby supporting more robust deployment and governance of these technologies in sensitive environments.
AB - Large Language Models (LLMs) have swiftly become prevalent in nearly every aspect of human life due to a combination of technological breakthroughs, practical usability, and rapid integration into everyday tools and workflows. Despite their remarkable capabilities, LLMs pose real challenges in their secure and safe development and deployment; and are vulnerable to various cyberattacks that can compromise their behaviour, outputs, security and performance. Understanding these vulnerabilities and potential cyberattacks on LLMs is essential for ensuring their secure and safe development and deployment. Numerous types of cyberattacks can be launched against LLMs, and there is currently no universally accepted classification system for these cyberattacks, as this remains an evolving area of research. This paper will provide a systematic and broad classification of LLM attacks into four major categories based on its four inherent and important components: input prompt, training data, underlying AI model and output; and these four categories of LLM attacks are: Input (Prompt) Related Cyberattacks, Data (Training) Related Cyberattacks, AI Model (Inference) Related Cyberattacks, and Output (Response) Related Cyberattacks. This paper will discuss all four aforementioned categories of cyberattacks on LLMs in detail including various types of cyberattacks in each category. Subsequently, it will discuss several risks associated with cyberattacks on LLMs. Finally, it will discuss several mitigation techniques for cyberattacks on LLMs. A rigorous examination and taxonomy of diverse cyberattacks targeting LLMs, alongside an analysis of associated risks and mitigation strategies, is poised to yield nuanced and actionable understanding regarding the security and safety landscape of LLMs. Through systematic classification and evaluation, such research will advance the field by illuminating various cyberattacks, vulnerabilities, risks, and defensive measures pertinent to LLM-based systems, thereby supporting more robust deployment and governance of these technologies in sensitive environments.
KW - Generative AI
KW - Large language models
KW - LLMs
KW - Cyberattacks on LLM
KW - Attacks on LLMs
UR - https://www.techrxiv.org/users/845749/articles/1367299-when-generative-ai-gets-hacked-a-comprehensive-classification-of-cyberattacks-on-large-language-models-llms-and-their-mitigation-techniques?commit=900dffa9238928a16379bc165f611daba0c6b24b
U2 - 10.36227/techrxiv.176540281.10631689/v1
DO - 10.36227/techrxiv.176540281.10631689/v1
M3 - Preprint
BT - When Generative AI Gets Hacked: A Comprehensive Classification of Cyberattacks on Large Language Models (LLMs) and Their Mitigation Techniques
ER -