Recognizing function prologues in binary files with recurrent neural networks

Recognizing function prologues in binary files with recurrent neural networks

Authors:

Abstract:

The article discusses the problem of recognising function prologues in binary files, which is one of the key subtasks of software reverse engineering. The proposed approach is to use a recurrent neural network that processes byte sequences of a binary file. A comparative analysis of existing neural network models for function recognition was conducted, and their advantages and limitations were identified, which made it possible to justify the choice of a simple and reproducible RNN architecture. The obtained results allow to make conclusions about the influence of model hyperparameters on the quality of model recognition. These hyperparameters correspond to the features of the machine architecture and binary file formats. The experiments were performed on binary files of the ESP32 microcontroller with Xtensa Little Endian architecture and STM-32WBA6 microcontroller of Cortex-M³³ core with ARMv8-M architecture using both standard and random alignment, which made it possible to evaluate the model’s resistance to changes in the structure of binary data. Based on the developed model, an extension for the IDA Pro disassembler has been implemented, demonstrating the practical applicability of the proposed approach in real reverse engineering tasks.