Optimization of data obfuscation in big data processing and storage systems
The paper is devoted to the task of reducing the attack surface from an internal attacker in heterogeneous big data processing and storage systems by choosing the optimal method of data obfuscation based on anonymization (depersonalization) technologies. The paper analyzes terminology and systematizes data hiding methods to reduce the attack surface in big data processing and storage systems. A formal formulation of the problem of finding the optimal method of data obfuscation and an algorithm for solving it over various types of datasets are proposed, taking into account evaluation criteria specific to each class of methods. The implementation of a software prototype to support decision-making and the choice of the optimal method for solving practical problems is described, experimental approbation and analysis of its results are carried out.