Combination of methods of selective teacher intervention in the student’s learning process and low-rank adaptation in the knowledge distillation model
The problem of neural network optimization for large language models, such as ChatGPT, is discussed. One of the developing directions of large language model optimization is knowledge distillation — knowledge transfer from a large teacher model to a smaller student model without significant loss of result accuracy. Currently known methods of knowledge distillation have certain disadvantages: inaccurate knowledge transfer, long learning process, error accumulation in long sequences. A combination of methods that contribute to improving the quality of knowledge distillation is considered: selective teacher intervention in the student learning process and low-rank adaptation. The proposed combination of knowledge distillation methods can find application in problems with limited computing resources.