Naive Bayes Classification for Numeric Data Using C#

I’ve been thinking about naive Bayes classification recently, in part because it’s going to be one of the topics I explain in a hands-on workshop at the upcoming Azure + AI Conference (see https://azureaiconf.com)

Naive Bayes classification can be used for numeric data, such as predicting the sex of a person who has height = 6.00′, weight = 185 lbs, foot = 9 inches. Naive Bayes can also be used for categorical data such predicting the sex of a person who has height = tall, weight = medium, foot = normal. The underlying theory is the same for the numeric data and categorical data scenarios, but the details are quite a bit different.

My demo program uses the data from the Wikipedia page on naive Bayes. There are 8 items. Each item is the height, weight, and foot size of a male or female. The goal is to predict the sex of a person who is 6.00 feet tall, weighs 130 lbs and has foot size 8 inches. The result is P(female) = 0.9999884.

A few day ago I reviewed an example with numeric data on the Wikipedia page on naive Bayes. I verified the Wikipedia calculations by performing the calculations myself, using Excel.

Just for fun I decided to perform the calculations using a C# program. It was an interesting exercise. I didn’t have any major problems because I’m quite familiar with naive Bayes for numeric data. The technique assumes that all data is Gaussian distributed and the technique uses the Gaussian probability distribution function, which I’m also very familiar with.

Here is the same problem, solved using Excel.

While I was reviewing the details of how naive Bayes classification works, I came across a technique called Bayes point machine classification. I spent a couple of hours trying to make sense of the little information I found on the Internet, including the source research paper. As far as I can tell, the Bayes point machine is yet another example of prolific research efforts that are overly complex solutions in search of a problem.

The fact that almost nobody uses the Bayes point machine classification technique suggests that it has no advantages over techniques, such as a shallow neural network, that are much simpler. I could be wrong however. The source research paper is very poorly written, in the sense that the paper was not written so that someone could actually implement the technique. So I’ll need to probe a bit deeper before I’m satisfied that Bayes point machine classification is in fact a dead end.

Robert K. Abbett (1926-2015) was a prolific artist who did the covers of many paperback novels in the 1960s. I like his style of art a lot. I’ve read “Thuvia, Maid of Mars”, by Edgar Rice Burroughs — an excellent novel. I haven’t read the other two books, but I suspect the cover art for them is better than the content. “When she crashed into his house, about all she wore was a guilty look.” Brilliant — modern day Shakespeare.