Ensuring the stability of online learning artificial intelligence systems based on model similarity assessment
The paper studies the problem of protecting artificial intelligence systems with online learning from poisoning attacks. To improve the stability, an approach is proposed based on assessing the similarity of the operation of two computational models: the reference (initial) and the operational (test). The following indicators of stability violation were identified: a decrease in the total accuracy (TA), total prediction value (TPV), and a decrease in the cosine similarity of model weights (cos_similarity). As a result of experimental study, it was found that the proposed solution allows for timely detection of poisoned data, maintaining high classification accuracy during targeted attacks on the computational model, which is further trained on test data.