Bates Distribution by Angel Narciso, Head of Data & Analytics


Bates Distribution by Angel Narciso, Head of Data & Analytics

Have you ever heard of the #Bates distribution?

Sometimes, statisticians, analysts and scientists need to use synthetical data for their projects, as it can can be generated quickly and inexpensively, in large quantities, with very specific properties, and also, avoiding bias, which is ideal, for instance, to test new software or to validate machine learning models and other algorithms.

But, what happens if we want to generate data following a #Normal distribution for our synthetic dataset? That, depending on the algorithm we use, we may end getting a Bates distribution instead.

The Bates distribution is a probability distribution that arises when sampling from a Normal distribution using a #Uniform random number generator. It is named after the English statistician George W. S. Bates, who introduced it in the early 20th century. The distribution has a bell-shaped curve, like the Normal distribution, but its shape is slightly different. The most notable difference is that the tails of the Bates distribution are thicker and heavier than those of the Normal distribution. This means that the probability of observing extreme values (values far from the mean) is higher for the Bates distribution than for the Normal distribution. This property makes the Bates distribution useful in certain statistical models, such as in modeling the spread of #stock #prices or other quantities that can exhibit large fluctuations, but in some cases it may not be desirable, as it can lead to over-prediction of extreme values.

Does this mean that every time that we want to generate a Normal distribution we will get a Bates one? Absolutely not! Fortunately, there are better and more reliable methods than using a Uniform random number generator, such as the Box-Muller transform, that we will we cover in another post.

To finish, if you want to play a little bit with the Bates distribution without the hassle of generating it from the scratch, the ‘dbates’ function in the ‘fitdistrplus’ package is available in #R, while in #Python you can use the ‘bates’ function in the ‘scipy.stats’ module.