Error of approximation of chi-square distribution by normal distribution as a function of sample size

Applied cryptography
Authors:
Abstract:

The article examines the approximation error of the chi-square distribution using a normal distribution, which is relevant when applying statistical tests to evaluate the quality of random number generators. The aim of the study is to determine under what conditions it is acceptable to replace chi-square statistics with their approximations (normal distributions) in order to simplify the calculation of p-values using the complementary error function. To achieve this goal, the research employed mathematical analysis methods, including analysis of the gamma distribution, which includes the chi-square distribution as a special case, and the application of the Berry — Esseen inequality for evaluating the accuracy of approximation. An analytical expression for the third absolute central moment of the distribution was obtained, allowing for an analytical estimation of the minimum length of a bit sequence necessary to achieve a specified level of approximation accuracy. The results showed that, in order to achieve the high level of accuracy required in cryptographic applications, there are significant practical limitations due to the required sample size. These limitations are related to computational complexity, memory requirements, and time costs. The issue of determining the optimal number of intervals in a test using chi-square statistics is considered to optimize the balance between the desired sensitivity with resistance to random fluctuations. The scientific novelty of this work lies in formalizing the conditions for using normal approximation to calculate p-values and developing recommendations for selecting the number of intervals and estimating the minimum sample size. The results obtained contribute to increasing the statistical significance and validity of statistical tests for verifying pseudo-random number generators. They also reduce the influence of heuristic considerations in determining the sample size, which increases the reliability of evaluating random number generator characteristics. In practical terms, this work aims to unify statistical analysis procedures in cryptography by formalizing conditions under which it is correct to replace the chi-square distribution with the normal distribution.