9004

Problems of information security. Computer systems

Проблемы информационной безопасности. Компьютерные системы

2071-8217

10.48612/jisp/rk44-9aab-nxha

Building a semantic space of intentions using generative pre-trained models for solving the spam filtering task

Построение семантического пространства интенциональностей с использованием генеративных предобученных моделей для решения задачи фильтрации спама

0000-0002-4429-8799

55229487100

Zhukov

Igor

i.zhukov@inbox.ru

0009-0004-0370-1379

Balashova

Ekaterina

ekatherina04@gmail.com

0009-0005-6983-8017

Mandrov

Aleksandr

sashamandrovp@gmail.com

0009-0007-8532-3640

Kravchenko

Nikita

nik.kravchenko.2004@bk.ru

National Research Nuclear University MEPhI (Moscow Engineering Physics Institute) National Research Nuclear University MEPhI

30 09 2025

3 55 68

One of the key elements in solving spam message filtering is the text vectorization method. The article proposes a vectorization approach based on matching text to pairs of intentions. A list of intention pairs was identified and a synthetic dataset of textual utterances was generated. A neural network was designed and trained to determine the degree of belonging of each intention to the text expression at the model input. The developed method was tested on the spam message filtering task using logistic regression and the Enron dataset and SMS dataset

Transformers DistilBert PCA neural networks synthetic data