The structural model of Portable Executable files containing malicious code
The article presents a study aimed at developing a model of Portable Executable files containing malicious code. The model is built based on static analysis methods and includes 333 classification features, formed using a training dataset of 34,026 PE files, comprising 17,992 malicious and 16,034 legitimate files. The proposed model introduces an approach for describing features using a differentiated assessment of their importance. Experimental results with binary feature description methods confirmed that incorporating feature importance levels improves classification accuracy. Additionally, it is demonstrated that optimizing the feature space using principal component analysis (PCA) and the isolation forest method allows reducing the number of features to 40 of the most informative ones without significant accuracy loss. The obtained results provide high classification accuracy with lower computational costs. The scientific significance of the work lies in expanding the methodological capabilities of static analysis, ensuring a deeper understanding of threats and enhancing the reliability of mechanisms for counteracting malicious software.