From exploitation to protection: a deep dive into adversarial attacks on LLMS

Machine learning and knowledge control systems
Authors:
Abstract:

Modern large language models possess impressive capabilities but remain vulnerable to various attacks that can manipulate their responses, lead to leakage of confidential data, or bypass restrictions. This paper focuses on the analysis of prompt injection attacks, which allow bypassing model constraints, extracting hidden data, or forcing the model to follow malicious instructions.