Naive Bayes Classification Example Using Raw Python 3.7

I’m preparing the content for an all-day hands-on workshop. My main topics are all about neural networks, but I have a few classical techniques too, including naive Bayes classification. Here’s an example that I’ll use in the workshop.

There are 40 data items that look like:

actuary  green  korea  1
barista  green  italy  0
dentist  hazel  japan  0
chemist  hazel  japan  2
. . . 

Each line of data is a person. The columns are job-type, eye-color, country, and personality extraversion (0, 1, 2). Suppose you want to predict the personality extraversion score of a person who is (barista, hazel, italy).

The first step is to compute the joint counts of each class (0, 1, 2) looking at each predictor variable separately (“naive”).

baker and class 0 = 3 + 1 = 4
baker and class 1 = 0 + 1 = 1
baker and class 2 = 1 + 1 = 2

hazel and class 0 = 5 + 1 = 6
hazel and class 1 = 2 + 1 = 3
hazel and class 2 = 2 + 1 = 3

italy and class 0 = 1 + 1 = 2
italy and class 1 = 5 + 1 = 6
italy and class 2 = 1 + 1 = 2

You add 1 to each raw count so that no count is 0. This is called Laplacian Smoothing.

The second step is to compute the raw counts, without smoothing, of each class:

class 0 = 19
class 1 = 14
class 2 =  7

The third step is to combine the results from step 1 and 2 using some fancy probability (“Bayes”), to get what are called evidence values (Z) for each class:

Z(0) = (5 / 19+3) * (6 / 19+3) * (2 / 19+3) * (19 / 40)
     = 4/22 * 5/22 * 1/22 * 19/40
     = 0.1818 * 0.2273 * 0.0435 * 0.4750
     = 0.0027

Z(1) = (1 / 14+3) * (3 / 14+3) * (6 / 14+3) * (14 / 40)
     = 1/17 * 3/17 * 6/17 * 14/40
     = 0.0588 * 0.1765 * 0.3529 * 0.3500
     = 0.0013

Z(2) = (2 / 7+3) * (3 / 7+3) * (2 / 7+3) * (7 / 40)
     = 2/10 * 3/10 * 2/10 * 7/40
     = 0.2000 * 0.3000 * 0.2000 * 0.1750
     = 0.0021

Note: All the “+3” terms are because there are 3 predictor variables. At this point, the predicted class is the one with the largest evidence value, which is class 0.

An optional final step is to normalize the evidence values so that they sum to 1.0 and can be loosely interpreted as pseudo-probabilities. The easiest way to do this is to divide each evidence value by the sum:

sum = 0.0027 + 0.0013 + 0.0021 = 0.0061

P(class 0) = 0.0027 / 0.0061 = 0.4418
P(class 1) = 0.0013 / 0.0061 = 0.2116
P(class 2) = 0.0021 / 0.0061 = 0.3466

As before, class 0 has the largest pseudo-probability so that’s the predicted class for a (barista, hazel, italy) person.

There are many variations of naive Bayes classification. This example is just one version, for problems where the predictor values are categorical (non-numeric).



The term “naive” means simple and unsophisticated. The terms applies well to my two dogs, Kevin and Riley. Left: Kevin when he just joined my family which already included Riley. Center: I woke up from a nap one afternoon, to find that Riley had proudly brought me my “Chess Life” magazine and some socks. She is waiting for praise. Right: Kevin went through a phase where he was obsessed by socks.


Demo code:

# naive_bayes.py
# Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np

# -----------------------------------------------------------

def main():
  print("\nBegin naive Bayes classification ")
  data = np.loadtxt(".\\people_data.txt", dtype=str,
    delimiter=" ", comments="#")
  print("\nData looks like: ")
  for i in range(5):
    print(data[i])
  print(". . . \n")

  nx = 3  # number predictor variables
  nc = 3  # number classes
  N = 40  # data items
  joint_cts = np.zeros((nx,nc), dtype=np.int64) 
  y_cts = np.zeros(nc, dtype=np.int64)

# -----------------------------------------------------------

  # X = ['dentist', 'hazel', 'italy']
  X = ['barista', 'hazel', 'italy']
  print("Item to predict/classify: ")
  print(X)

  for i in range(N):
    y = int(data[i,nx])  # class is in last column
    y_cts[y] += 1
    for j in range(nx):
      if data[i][j] == X[j]:
        joint_cts[j][y] += 1

  joint_cts += 1  # Laplacian smoothing

  print("\nJoint counts (smoothed): ")
  print(joint_cts)
  print("\nClass counts (raw): ")
  print(y_cts)

# -----------------------------------------------------------

  # compute evidence terms directly
  # e_terms = np.zeros(nc, dtype=np.float32) 
  # for k in range(nc):
  #   v = 1.0
  #   for j in range(nx):
  #     v *= joint_cts[j,k] / (y_cts[k] + nx)
  #   v *= y_cts[k] / N
  #   e_terms[k] = v

# -----------------------------------------------------------

  # compute evidence terms using log trick to avoid underflow
  e_terms = np.zeros(nc, dtype=np.float32) 
  for k in range(nc):
    v = 0.0
    for j in range(nx):
      v += np.log(joint_cts[j,k]) - np.log(y_cts[k] + nx)
    v += np.log(y_cts[k]) - np.log(N)
    e_terms[k] = np.exp(v)

# -----------------------------------------------------------

  np.set_printoptions(precision=4, suppress=True)
  print("\nEvidence terms: ")
  print(e_terms)

  sum_evidence = np.sum(e_terms)
  probs = np.zeros(nc, dtype=np.float32)
  for k in range(nc):
    probs[k] = e_terms[k] / sum_evidence

  print("\nPseudo-probabilities: ")
  print(probs)

  pc = np.argmax(probs)
  print("\nPredicted class: ")
  print(pc)

  print("\nEnd naive Bayes demo ")

if __name__ == "__main__":
  main()

Demo data:

# people_data.txt
# job-type  eye-color country  extraversion
#
actuary green korea 1
barista green italy 0
dentist hazel japan 0
dentist green japan 1
chemist hazel japan 2
actuary green japan 1
actuary green japan 0
chemist green italy 1
chemist green italy 2
dentist green japan 1
dentist green japan 0
dentist green japan 1
dentist green japan 2
chemist green italy 1
dentist green japan 1
dentist hazel japan 0
chemist green korea 1
barista green japan 0
actuary green italy 1
actuary green italy 1
dentist green korea 0
barista green japan 2
dentist green japan 0
barista green korea 0
dentist green japan 0
actuary hazel italy 1
dentist hazel japan 0
dentist green japan 2
dentist green japan 0
chemist hazel japan 2
dentist green korea 0
dentist hazel korea 0
dentist green japan 0
dentist green japan 2
dentist hazel japan 0
actuary hazel japan 1
actuary green japan 0
actuary green japan 1
dentist green japan 0
barista green japan 0
This entry was posted in Machine Learning, Miscellaneous. Bookmark the permalink.