I recently had to compute the Bayes error for a Gaussian model.
Let xi be a predictive variable, following a Gaussian distribution of variance 1 and mean Ti in a class, and -Ti in the other class.
The Bayes error of the model is e=1-Φ(sqrt(sum(Ti^2)).
In R, the standard normal cumulative distribution function is computed using pnorm().
So, here’s the Bayes error if for instance we have 10 variables with mean 0.5 in a class and -0.5 in the other class:
If we had more complicated things, like 5 variables with mean 0.5,0.6,0.7,0.8,0.9 in a class and -0.5,-0.6,-0.7,-0.8,-0.9 in the other, we could do something like:
or not vectorized:
Edit: I just found again about this post through my statistics panel, and I really have no idea why I titled it “computing the std normal cumulative distrib”. It should rather be “computing the Bayes error”… Leaving it this way in order not to mess with the established pretty URL, though.