Combination of methods of selective teacher intervention in the student’s learning process and low-rank adaptation in the knowledge distillation model

Machine learning and knowledge control systems
Authors:
Abstract:

The problem of neural network optimization for large language models, such as ChatGPT, is discussed. One of the developing directions of large language model optimization is knowledge distillation — knowledge transfer from a large teacher model to a smaller student model without significant loss of result accuracy. Currently known methods of knowledge distillation have certain disadvantages: inaccurate knowledge transfer, long learning process, error accumulation in long sequences. A combination of methods that contribute to improving the quality of knowledge distillation is considered: selective teacher intervention in the student learning process and low-rank adaptation. The proposed combination of knowledge distillation methods can find application in problems with limited computing resources.