Protection against adversarial attacks based on a dynamically reconfigurable ensemble of machine learning models

Machine learning and knowledge control systems
Authors:
Abstract:

The paper reviews the problem of protecting machine learning models from adversarial attacks. A protection method is presented based on a dynamically reconfigurable ensemble of classifiers with a failure mechanism that combines a random combination of heterogeneous sub-models, online analysis of forecast variance, simulation of a plausible attack response, and a decoy model mechanism. Analysis of the consistency of outputs in the ensemble and failure to issue the most probable output reduces the effectiveness of an attacker when analyzing feedback received from the target model and generating adversarial samples. An experimental evaluation conducted on the UNSW-NB15 dataset showed that the developed method maintains high initial accuracy of the protected model under adversarial attacks (85−95%) with a minimal decrease of 1−3 percentage points. The method can eliminate up to 98% of attacks, significantly exceeding the performance of similar widely used methods.