Building a semantic space of intentions using generative pre-trained models for solving the spam filtering task
Authors:
Abstract:
One of the key elements in solving spam message filtering is the text vectorization method. The article proposes a vectorization approach based on matching text to pairs of intentions. A list of intention pairs was identified and a synthetic dataset of textual utterances was generated. A neural network was designed and trained to determine the degree of belonging of each intention to the text expression at the model input. The developed method was tested on the spam message filtering task using logistic regression and the Enron dataset and SMS dataset