9004

Problems of information security. Computer systems

Проблемы информационной безопасности. Компьютерные системы

2071-8217

10.48612/jisp/mbvv-n1u7-z7be

From exploitation to protection: a deep dive into adversarial attacks on LLMS

От эксплуатации к защите: анализ атак на большие языковые модели

0009-0005-6662-5606

Velichko

Ivan

wwr0ngn4m3@gmail.com

0000-0002-0924-6221

Bezzateev

Sergey

sergey.bezzateev@gmail.com

Saint Petersburg State University of Aerospace Instrumentation

25 03 2025

1 43 58

Modern large language models possess impressive capabilities but remain vulnerable to various attacks that can manipulate their responses, lead to leakage of confidential data, or bypass restrictions. This paper focuses on the analysis of prompt injection attacks, which allow bypassing model constraints, extracting hidden data, or forcing the model to follow malicious instructions.

Large language models artificial intelligence adversarial attacks defense methods model output manipulation