<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="ru">
  <front xmlns:xlink="http://www.w3.org/1999/xlink">
    <journal-meta>
      <journal-id journal-id-type="elibrary">9004</journal-id>
      <journal-title-group>
        <journal-title>Problems of information security. Computer systems</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Проблемы информационной безопасности. Компьютерные системы</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2071-8217</issn>
    </journal-meta>
    <article-meta xmlns:xlink="http://www.w3.org/1999/xlink">
      <article-id pub-id-type="publisher-id">5</article-id>
      <article-id pub-id-type="doi">10.48612/jisp/rk44-9aab-nxha</article-id>
      <title-group>
        <article-title>Building a semantic space of intentions using generative pre-trained models for solving the spam filtering task</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Построение семантического пространства интенциональностей с использованием генеративных предобученных моделей для решения задачи фильтрации спама</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0002-4429-8799</contrib-id>
          <contrib-id contrib-id-type="scopus">55229487100</contrib-id>
          <name>
            <surname>Zhukov</surname>
            <given-names>Igor</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>i.zhukov@inbox.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0009-0004-0370-1379</contrib-id>
          <name>
            <surname>Balashova</surname>
            <given-names>Ekaterina</given-names>
          </name>
          <xref ref-type="aff" rid="aff2"/>
          <email>ekatherina04@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0009-0005-6983-8017</contrib-id>
          <name>
            <surname>Mandrov</surname>
            <given-names>Aleksandr</given-names>
          </name>
          <xref ref-type="aff" rid="aff2"/>
          <email>sashamandrovp@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0009-0007-8532-3640</contrib-id>
          <name>
            <surname>Kravchenko</surname>
            <given-names>Nikita</given-names>
          </name>
          <xref ref-type="aff" rid="aff2"/>
          <email>nik.kravchenko.2004@bk.ru</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)</aff>
      <aff id="aff2">National Research Nuclear University MEPhI</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-09-30">
        <day>30</day>
        <month>09</month>
        <year>2025</year>
      </pub-date>
      <issue>3</issue>
      <fpage>55</fpage>
      <lpage>68</lpage>
      <self-uri xmlns:xlink="http://www.w3.org/1999/xlink" content-type="pdf" xlink:href="https://jisp.spbstu.ru/userfiles/files/soderzhaniya/pib_3_5-6.pdf"/>
      <abstract xml:lang="en">
        <p>One of the key elements in solving spam message filtering is the text vectorization method. The article proposes a vectorization approach based on matching text to pairs of intentions. A list of intention pairs was identified and a synthetic dataset of textual utterances was generated. A neural network was designed and trained to determine the degree of belonging of each intention to the text expression at the model input. The developed method was tested on the spam message filtering task using logistic regression and the Enron dataset and SMS dataset</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>Transformers</kwd>
        <kwd>DistilBert</kwd>
        <kwd>PCA</kwd>
        <kwd>neural networks</kwd>
        <kwd>synthetic data</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
