9004

Problems of information security. Computer systems

Проблемы информационной безопасности. Компьютерные системы

2071-8217

10.48612/jisp/2nkk-unee-uuzv

From exploitation to protection: analysis of methods for defending against attacks on LLMS

От эксплуатации к защите: анализ методов защиты от атак на языковые модели

0009-0005-6662-5606

Velichko

Ivan

wwr0ngn4m3@gmail.com

0000-0002-0924-6221

Bezzateev

Sergey

sergey.bezzateev@gmail.com

Saint Petersburg State University of Aerospace Instrumentation

30 09 2025

3 110 120

Modern large language models demonstrate impressive capabilities but remain vulnerable to attacks that can manipulate their behavior, extract confidential data, or bypass built-in restrictions. This paper focuses on methods for protecting language models from prompt injection attacks, which allow adversaries to exploit the system for malicious purposes. Various defense strategies are examined and analyzed, including query filtering, context isolation, training on perturbed data, and other approaches. A comparative analysis of the effectiveness of defense mechanisms is conducted, highlighting their limitations and identifying future directions for enhancing the security of language models.

Large language models artificial intelligence adversarial attacks defense methods model output manipulation