Neural Network Regression From Scratch Using Python

Every few months, I revisit one of my many neural network implementations. Because neural networks are so complicated, there are dozens of ideas to explore. I always find something new and interesting.

I used one of my standard synthetic datasets. The goal is to predict the income of a person based on sex (male, female), age, State of residence, and political leaning. The raw data looks like:

F   24   michigan   29500.00   liberal
M   39   oklahoma   51200.00   moderate
F   63   nebraska   75800.00   conservative
M   36   michigan   44500.00   moderate
F   27   nebraska   28600.00   liberal
. . .

The normalized and encoded data is:

 1, 0.24, 1, 0, 0, 0.2950, 0, 0, 1
-1, 0.39, 0, 0, 1, 0.5120, 0, 1, 0
 1, 0.63, 0, 1, 0, 0.7580, 1, 0, 0
-1, 0.36, 1, 0, 0, 0.4450, 0, 1, 0
 1, 0.27, 0, 1, 0, 0.2860, 0, 0, 1
. . .

Sex is encoded as M = -1, F = 1. Age is normalized by dividing by 100. State is one-hot encoded as Michigan = 100, Nebraska = 010, Oklahoma = 001. Income (the dependent variable) is normalized by dividing by 100,000. Political leaning is one-hot encoded as conservative = 100, moderate = 010, liberal = 001. There are 200 training items and 40 test items.

Implementing a neural network from scratch is very complicated, but I’ve been studying NNs for years and I’ve implemented them many times. Even so, my latest effort took several hours of rather intense effort. The complexity of NNs means that there are dozens and dozens of design alternatives.

The training and test data is loaded into memory like so:

import numpy as np

def main():
  print("Begin NN using raw Python demo ")
  # 1. load data
  #  1, 0.24, 1, 0, 0, 0.2950, 0, 0, 1
  # -1, 0.39, 0, 0, 1, 0.5120, 0, 1, 0

  print("Loading data into memory ")
  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  train_x = np.loadtxt(train_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter=",", comments="#", dtype=np.float32)
  train_y = np.loadtxt(train_file, usecols=5,
    delimiter=",", comments="#", dtype=np.float32)

  test_x = np.loadtxt(test_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter=",", comments="#", dtype=np.float32)
  test_y = np.loadtxt(test_file, usecols=5,
    delimiter=",", comments="#", dtype=np.float32)
. . .

There are no library dependencies except for NumPy. The neural network classifier is created using these statements:

  # 2. create network
  print("Creating 8-25-1 tanh, log-sigmoid MSE NN ")
  nn = NeuralNetwork(8, 25, 1, seed=0)
. . .

The number of input nodes (8) and the number of output nodes (1) are determined by the data. The number of hidden nodes (25) must be determined by trial and error. The seed value is used for a random number generator that is used for weight and bias initialization and to scramble the order in which data items are processed during training. The neural network uses tanh() activation on the hidden nodes and logistic sigmoid activation on the output node. The training algorithm uses mean squared error (as opposed to cross entropy error).

In many regression problem scenarios, I use Identity activation — a.k.a. Linear activation a.k.a. no activation. But because the target income values are all between 0.0 and 1.0, logistic sigmoid activation is feasible. And for this data, logistic sigmoid activation worked a bit better than Identity/Linear/No activation.

The neural network is trained like so:

  # 3. train network
  lrn_rate = 0.01
  max_epochs = 1000
  print("Setting learn rate = 0.01 ")
  print("Setting batch size = 10 ")
  print("Setting max epochs = 1000 ")
  print("Starting training ")
  nn.train(train_x, train_y, lrn_rate, 10, max_epochs)
  print("Training complete ")
. . .

The learning rate, batch size, and number of training epochs were determined by trial and error. The progress messages look like:

Starting training
epoch:     0   MSE =   0.0211   acc =   0.1600
epoch:   100   MSE =   0.0207   acc =   0.1600
epoch:   200   MSE =   0.0200   acc =   0.1600
epoch:   300   MSE =   0.0167   acc =   0.1550
epoch:   400   MSE =   0.0086   acc =   0.2500
epoch:   500   MSE =   0.0020   acc =   0.5550
epoch:   600   MSE =   0.0008   acc =   0.7650
epoch:   700   MSE =   0.0007   acc =   0.8050
epoch:   800   MSE =   0.0007   acc =   0.8150
epoch:   900   MSE =   0.0007   acc =   0.8150
Training complete

The demo program evaluates the trained model using these statements:

  # 4. evaluate model
  train_acc = nn.accuracy(train_x, train_y, 0.07)
  test_acc = nn.accuracy(test_x, test_y, 0.07)
  print("Accuracy on training data = %0.4f" % train_acc)
  print("Accuracy on test data = %0.4f" % test_acc)

  income_pts = [0.0, 0.25, 0.50, 0.75, 1.0]
  print("Accuracy matrix for test data: ")
  am = nn.accuracy_matrix(test_x, test_y, 0.07, income_pts)
  nn.show_acc_matrix(am, income_pts)
. . .

The output is:

Accuracy on training data = 0.8100
Accuracy on test data = 0.8250

Accuracy matrix for test data:
    from    to     correct  wrong   count    accuracy
    0.00    0.25       0       1       1      0.0000
    0.25    0.50      12       4      16      0.7500
    0.50    0.75      20       2      22      0.9091
    0.75    1.00       1       0       1      1.0000

The overall classification accuracy on the 40 test items is 33/40 = 0.8250.

The demo program saves the trained model weights and biases to file using these statements:

  # 5. save trained model
  print("Saving trained weights to file ")
  nn.save_weights(".\\Models\\income_weights.txt")

  # nn = NeuralNetwork(8, 25, 1, seed=0)
  # nn.load_weights(".\\Models\\income_weights.txt")
. . .

The demo concludes by using the trained model to make a prediction:

. . .
  # 6. use trained model
  print("Predict for M 46 Oklahoma moderate")
  x = np.array([-1, 0.46, 0, 0, 1, 0, 1, 0],
    dtype=np.float32)
  pred_inc = nn.compute_output(x)
  print("\nPredicted income: %0.5f " % pred_inc)

  print("End demo ")

The output is:

Predicted income: 0.56684

This is the same as raw income $56,684.00.

Another fun and interesting exploration.

Some people augment their income through dishonest means. Some time ago I did a Google search for “school embezzlement arrest”. The results were sad but not entirely unexpected. At first I was mildly surprised at how many women were represented but when I learned that approximately 85% of school administrators are women, the results made sense.

From left to right, top to bottom. Ingrid Grant embezzled $400,000 from the Arlington, VA school district. Dana Walker embezzled from H.W. Byers High School in Holly Springs, MS. Melissa Nance was a principal at Nichols Elementary School in Biloxi, MS. Tamika Martinez-Abihai embezzled from several schools in Jacksonville, FL. Tramaine Jones embezzled from Garden City Elementary School in Jacksonville, FL. Shakicha Natasha Murphy embezzled from Reid Ross Classical School in Fayetteville, NC. Mandy Bellamy embezzled from Riverside Elementary School in Mrytle Beach, FL. Karen Burnette Cheek embezzled from Northwood High School in Pittsboro, NC. I noticed that all of these criminals have something in common . . . they’re all from the South.

Demo code. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols.

# people_income.py
# neural network, scratch Python

import numpy as np

class NeuralNetwork:

  def __init__(self, num_in, num_hid, num_out, seed):
    self.ni = num_in
    self.nh = num_hid
    self.no = num_out
	
    self.i_nodes = np.zeros(shape=self.ni, dtype=np.float32)
    self.h_nodes = np.zeros(shape=self.nh, dtype=np.float32)
    self.o_nodes = np.zeros(shape=self.no, dtype=np.float32)
	
    self.ih_weights = np.zeros(shape=(self.ni,self.nh),
      dtype=np.float32)
    self.ho_weights = np.zeros(shape=(self.nh,self.no),
      dtype=np.float32)
	
    self.h_biases = np.zeros(shape=self.nh, dtype=np.float32)
    self.o_biases = np.zeros(shape=self.no, dtype=np.float32)

    self.ih_grads = np.zeros((self.ni, self.nh),
      dtype=np.float32)
    self.hb_grads = np.zeros(self.nh, dtype=np.float32)
    self.ho_grads = np.zeros((self.nh, self.no),
      dtype=np.float32)
    self.ob_grads = np.zeros(self.no, dtype=np.float32)
	
    self.rnd = np.random.RandomState(seed)
    self.init_weights()

# -----------------------------------------------------------

  def init_weights(self):
    num_wts = (self.ni * self.nh) + self.nh + \
      (self.nh * self.no) + self.no
    wts = np.zeros(shape=num_wts, dtype=np.float32)
    lo = -0.01; hi = 0.01
    for i in range(len(wts)):
      wts[i] = (hi - lo) * self.rnd.random() + lo
    self.set_weights(wts)

# -----------------------------------------------------------

# -----------------------------------------------------------

  def set_weights(self, weights):
    idx = 0
    for i in range(self.ni):
      for j in range(self.nh):
        self.ih_weights[i][j] = weights[idx]
        idx += 1
    for j in range(self.nh):
      self.h_biases[j] = weights[idx]
      idx += 1
    for j in range(self.nh):
      for k in range(self.no):
        self.ho_weights[j][k] = weights[idx]
        idx += 1
    for k in range(self.no):
      self.o_biases[k] = weights[idx]
      idx += 1

# -----------------------------------------------------------

  def get_weights(self):
    # order: ih_wts, h_biases, ho_wts, o_biases
    num_wts = (self.ni * self.nh) + self.nh + \
      (self.nh * self.no) + self.no
    result = np.zeros(num_wts, dtype=np.float32)
    p = 0
    for i in range(self.ni):
      for j in range(self.nh):
        result[p] = self.ih_weights[i][j]
        p += 1
    for j in range(self.nh):
      result[p] = self.h_biases[j]
      p += 1
    for j in range(self.nh):
      for k in range(self.no):
        result[p] = self.ho_weights[j][k]
        p += 1
    for k in range(self.no):
      result[p] = self.o_biases[k]
      p += 1
    return result

# -----------------------------------------------------------

  def compute_output(self, x):
    h_sums = np.zeros(self.nh, dtype=np.float32)
    o_sums = np.zeros(self.no, dtype=np.float32)  # size [1]
    # copy x into i_nodes to avoid by-ref errors
    for i in range(len(x)):
      self.i_nodes[i] = x[i]

    for j in range(self.nh):
      for i in range(self.ni):
        h_sums[j] += self.i_nodes[i] * self.ih_weights[i][j]
      h_sums[j] += self.h_biases[j]
      self.h_nodes[j] = np.tanh(h_sums[j])

    for k in range(self.no):
      for j in range(self.nh):
        o_sums[k] += self.h_nodes[j] * self.ho_weights[j][k]
      o_sums[k] += self.o_biases[k]

    # apply logistic sigmoid output activation
    for k in range(self.no):  # a single node
      self.o_nodes[k] = self.log_sigmoid(o_sums[k])
      # self.o_nodes[k] = o_sums[k]  # no activation
	  
    return self.o_nodes[0]  # single scalar in [0.0 1.0]

# -----------------------------------------------------------

  @staticmethod
  def log_sigmoid(x):
    if x "lt" -10.0: return 0.0
    elif x "gt" 10.0: return 1.0
    else: return 1.0 / (1.0 + np.exp(-x))

# -----------------------------------------------------------

  def zero_out_grads(self):
    for i in range(self.ni):
      for j in range(self.nh):
        self.ih_grads[i][j] = 0.0
    for j in range(self.nh):  
      self.hb_grads[j] = 0.0
    for j in range(self.nh):
      for k in range(self.no):
        self.ho_grads[j][k] = 0.0
    for k in range(self.no):
      self.ob_grads[k] = 0.0

# -----------------------------------------------------------

  def accum_grads(self, y):
    # y is target scalar
    o_signals = np.zeros(self.no, dtype=np.float32)
    h_signals = np.zeros(self.nh, dtype=np.float32)

    # 1. compute output node scratch signals 
    for k in range(self.no):
      # derivative = 1.0  # if identity() activation

      # print("======")
      # print(self.o_nodes[k])
      # print(1 - self.o_nodes[k])
      # print(self.o_nodes[k] * (1 - self.o_nodes[k]))
      # print(y)
      # print(self.o_nodes[k] - y)
      # print("======")
      # input()

      derivative = self.o_nodes[k] * (1 - self.o_nodes[k])
      o_signals[k] = derivative * (self.o_nodes[k] - y) 

    # 2. accum hidden-to-output gradients 
    for j in range(self.nh):
      for k in range(self.no):
        self.ho_grads[j][k] += o_signals[k] * \
          self.h_nodes[j]

    # 3. accum output node bias gradients
    for k in range(self.no):
      self.ob_grads[k] += o_signals[k] * 1.0 

    # 4. compute hidden node signals
    for j in range(self.nh):
      sum = 0.0
      for k in range(self.no):
        sum += o_signals[k] * self.ho_weights[j][k]

      derivative = \
        (1 - self.h_nodes[j]) * \
        (1 + self.h_nodes[j])  # assumes tanh
      h_signals[j] = derivative * sum

    # 5. accum input-to-hidden gradients
    for i in range(self.ni):
      for j in range(self.nh):
        self.ih_grads[i][j] += \
          h_signals[j] * self.i_nodes[i]

    # 6. accum hidden node bias gradients
    for j in range(self.nh):
      self.hb_grads[j] += h_signals[j] * 1.0

# -----------------------------------------------------------

  def update_weights(self, lrn_rate):
    # assumes all gradients computed
    # 1. update input-to-hidden weights
    for i in range(self.ni):
      for j in range(self.nh):
        delta = -1.0 * lrn_rate * self.ih_grads[i][j]
        self.ih_weights[i][j] += delta

    # 2. update hidden node biases
    for j in range(self.nh):
      delta = -1.0 * lrn_rate * self.hb_grads[j]
      self.h_biases[j] += delta

    # 3. update hidden-to-output weights
    for j in range(self.nh):
      for k in range(self.no):
        delta = -1.0 * lrn_rate * self.ho_grads[j][k]
        self.ho_weights[j][k] += delta

    # 4. update output node biases
    for k in range(self.no):
      delta = -1.0 * lrn_rate * self.ob_grads[k]
      self.o_biases[k] += delta

    # 5. clip weights
    # self.clip_weights(-1000000.0, 1000000.0)

# -----------------------------------------------------------

  def train(self, train_x, train_y, lrn_rate, bat_size,
    max_epochs):
    n = len(train_x)                  # like 200
    batches_per_epoch = n // bat_size # like 20
    freq = max_epochs / 10            # progress
    indices = np.arange(n)

    for epoch in range(max_epochs): 
      self.rnd.shuffle(indices)
      ptr = 0   # points into indices
      for bat_idx in range(batches_per_epoch): # 0, 1, .. 19
        for i in range(bat_size):  # 0 . . 9
          ii = indices[ptr]; ptr += 1
          x = train_x[ii]
          y = train_y[ii]
          self.compute_output(x)  # into self.o_nodes
          self.accum_grads(y)

        self.update_weights(lrn_rate)
        self.zero_out_grads()  # prep for next batch
 
      if epoch % freq == 0:
        mse = self.mean_sq_err(train_x, train_y)
        acc = self.accuracy(train_x, train_y, 0.07)
        s1 = "epoch: %5d" % epoch
        s2 = "   MSE = %8.4f" % mse
        s3 = "   acc = %8.4f" % acc
        print(s1 + s2 + s3)

# -----------------------------------------------------------

  def mean_BCE(self, data_x, data_y):
    # not used this version
    err = 0.0  # sum binary cross entropy errors
    for i in range(len(data_x)):
      x = data_x[i]
      actual_y = data_y[i]  # target 0 or 1
      pred_y = self.compute_output(x)  # like 0.6789
     
      if actual_y == 1:
        err += -np.log(pred_y)
      else:
        err += -np.log(1.0 - pred_y)

    return err / len(data_x)

# -----------------------------------------------------------

  def mean_sq_err(self, data_x, data_y):
    sum_se = 0.0
    for i in range(len(data_x)):
      x = data_x[i]
      y = data_y[i]   # target output 0 or 1
      oupt = self.compute_output(x)  # 0.1234
      sum_se += (y - oupt) * (y - oupt)

    return sum_se / len(data_x)   # consider Root MSE

# -----------------------------------------------------------

  def accuracy(self, data_x, data_y, pct_close):
    nc = 0; nw = 0;
    for i in range(len(data_x)):
      x = data_x[i]
      y = data_y[i]  # target 0 or 1
      oupt = self.compute_output(x)
      if np.abs(y - oupt) "lt" np.abs(y * pct_close):
        nc += 1
      else:
        nw += 1

    return nc / (nc + nw)

# -----------------------------------------------------------

  def accuracy_matrix(self, data_x, data_y,
    pct_close, points):

    n_intervals = len(points) - 1
    # n_correct at col [0]
    result = np.zeros((n_intervals,2), dtype=np.int64)
    for i in range(len(data_x)):
      x = data_x[i]
      y = data_y[i]  # target 0 or 1
      oupt = self.compute_output(x)  # like 0.3456

      interval = 0
      for i in range(n_intervals):
        if y "gte" points[i] and y "lt" points[i+1]:
          interval = i
          break

      if np.abs(y - oupt) "lt" np.abs(y * pct_close):
        result[interval][0] += 1
      else:
        result[interval][1] += 1

    return result

# -----------------------------------------------------------

  def show_acc_matrix(self, am, points):
    h = "from    to     correct  wrong   count    accuracy"
    print("    " + h)
    for i in range(len(am)):
      print("%8.2f" % points[i], end="")
      print("%8.2f" % points[i+1], end="")
      print("%8d" % am[i][0], end ="")
      print("%8d" % am[i][1], end ="")
      count = am[i][0] + am[i][1]
      print("%8d" % count, end="")
      if count == 0:
        acc = 0.0
      else:
        acc = am[i][0] / count
      print("%12.4f" % acc)

# -----------------------------------------------------------

  def save_weights(self, fn):
    # write weights as single comma-delimied line
    wts = self.get_weights()
    n = len(wts)
    ofs = open(fn, "w")
    for i in range(n):
      w = wts[i]
      ofs.write("%0.4f" % w)
      if i != n-1:
        ofs.write(",")
    ofs.write("\n")
    ofs.close()

# -----------------------------------------------------------

  def load_weights(self, fn):
    ifs = open(fn, "r")
    s = ifs.readline()
    tokens = s.split(",")
    wts = np.zeros(len(tokens), dtype=np.float32)
    for i in range(len(wts)):
      wts[i] = float(tokens[i])
    ifs.close()
    self.set_weights(wts)

# -----------------------------------------------------------
# -----------------------------------------------------------

def main():
  print("\nBegin NN using raw Python demo ")
  # 1. load data
  #  1, 0.24, 1, 0, 0, 0.2950, 0, 0, 1
  # -1, 0.39, 0, 0, 1, 0.5120, 0, 1, 0

  print("\nLoading data into memory ")
  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  train_x = np.loadtxt(train_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter=",", comments="#", dtype=np.float32)
  train_y = np.loadtxt(train_file, usecols=5,
    delimiter=",", comments="#", dtype=np.float32)

  test_x = np.loadtxt(test_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter=",", comments="#", dtype=np.float32)
  test_y = np.loadtxt(test_file, usecols=5,
    delimiter=",", comments="#", dtype=np.float32)

  # 2. create network
  print("\nCreating 8-25-1 tanh, log-sigmoid MSE NN ")
  nn = NeuralNetwork(8, 25, 1, seed=0)

  # 3. train network
  lrn_rate = 0.01
  max_epochs = 1000
  print("\nSetting learn rate = 0.01 ")
  print("Setting batch size = 10 ")
  print("Setting max epochs = 1000 ")
  print("\nStarting training ")
  nn.train(train_x, train_y, lrn_rate, 10, max_epochs)
  print("Training complete ")

  # wts = nn.get_weights()
  # print(wts)

  # 4. evaluate model
  train_acc = nn.accuracy(train_x, train_y, 0.07)
  test_acc = nn.accuracy(test_x, test_y, 0.07)
  print("\nAccuracy on training data = %0.4f" % train_acc)
  print("Accuracy on test data = %0.4f" % test_acc)

  income_pts = [0.0, 0.25, 0.50, 0.75, 1.0]
  print("\nAccuracy matrix for test data: ")
  am = nn.accuracy_matrix(test_x, test_y, 0.07, income_pts)
  nn.show_acc_matrix(am, income_pts)

  # 5. save trained model
  print("\nSaving trained weights to file ")
  nn.save_weights(".\\Models\\income_weights.txt")

  # nn = NeuralNetwork(8, 25, 1, seed=0)
  # nn.load_weights(".\\Models\\income_weights.txt")

  # 6. use trained model
  print("\nPredict for M 46 Oklahoma moderate")
  x = np.array([-1, 0.46, 0, 0, 1, 0, 1, 0],
    dtype=np.float32)
  pred_inc = nn.compute_output(x)
  print("\nPredicted income: %0.5f " % pred_inc)

  print("\nEnd demo ")

if __name__ == "__main__":
  main()

Training data:

# people_train.txt
#
# sex (-1 = male, 1 = female), age / 100,
# state (michigan = 100, nebraska = 010,
# oklahoma = 001),
# income / 100_000,
# politics (conservative = 100, moderate = 010,
# liberal = 001)
#
 1, 0.24, 1, 0, 0, 0.2950, 0, 0, 1
-1, 0.39, 0, 0, 1, 0.5120, 0, 1, 0
 1, 0.63, 0, 1, 0, 0.7580, 1, 0, 0
-1, 0.36, 1, 0, 0, 0.4450, 0, 1, 0
 1, 0.27, 0, 1, 0, 0.2860, 0, 0, 1
 1, 0.50, 0, 1, 0, 0.5650, 0, 1, 0
 1, 0.50, 0, 0, 1, 0.5500, 0, 1, 0
-1, 0.19, 0, 0, 1, 0.3270, 1, 0, 0
 1, 0.22, 0, 1, 0, 0.2770, 0, 1, 0
-1, 0.39, 0, 0, 1, 0.4710, 0, 0, 1
 1, 0.34, 1, 0, 0, 0.3940, 0, 1, 0
-1, 0.22, 1, 0, 0, 0.3350, 1, 0, 0
 1, 0.35, 0, 0, 1, 0.3520, 0, 0, 1
-1, 0.33, 0, 1, 0, 0.4640, 0, 1, 0
 1, 0.45, 0, 1, 0, 0.5410, 0, 1, 0
 1, 0.42, 0, 1, 0, 0.5070, 0, 1, 0
-1, 0.33, 0, 1, 0, 0.4680, 0, 1, 0
 1, 0.25, 0, 0, 1, 0.3000, 0, 1, 0
-1, 0.31, 0, 1, 0, 0.4640, 1, 0, 0
 1, 0.27, 1, 0, 0, 0.3250, 0, 0, 1
 1, 0.48, 1, 0, 0, 0.5400, 0, 1, 0
-1, 0.64, 0, 1, 0, 0.7130, 0, 0, 1
 1, 0.61, 0, 1, 0, 0.7240, 1, 0, 0
 1, 0.54, 0, 0, 1, 0.6100, 1, 0, 0
 1, 0.29, 1, 0, 0, 0.3630, 1, 0, 0
 1, 0.50, 0, 0, 1, 0.5500, 0, 1, 0
 1, 0.55, 0, 0, 1, 0.6250, 1, 0, 0
 1, 0.40, 1, 0, 0, 0.5240, 1, 0, 0
 1, 0.22, 1, 0, 0, 0.2360, 0, 0, 1
 1, 0.68, 0, 1, 0, 0.7840, 1, 0, 0
-1, 0.60, 1, 0, 0, 0.7170, 0, 0, 1
-1, 0.34, 0, 0, 1, 0.4650, 0, 1, 0
-1, 0.25, 0, 0, 1, 0.3710, 1, 0, 0
-1, 0.31, 0, 1, 0, 0.4890, 0, 1, 0
 1, 0.43, 0, 0, 1, 0.4800, 0, 1, 0
 1, 0.58, 0, 1, 0, 0.6540, 0, 0, 1
-1, 0.55, 0, 1, 0, 0.6070, 0, 0, 1
-1, 0.43, 0, 1, 0, 0.5110, 0, 1, 0
-1, 0.43, 0, 0, 1, 0.5320, 0, 1, 0
-1, 0.21, 1, 0, 0, 0.3720, 1, 0, 0
 1, 0.55, 0, 0, 1, 0.6460, 1, 0, 0
 1, 0.64, 0, 1, 0, 0.7480, 1, 0, 0
-1, 0.41, 1, 0, 0, 0.5880, 0, 1, 0
 1, 0.64, 0, 0, 1, 0.7270, 1, 0, 0
-1, 0.56, 0, 0, 1, 0.6660, 0, 0, 1
 1, 0.31, 0, 0, 1, 0.3600, 0, 1, 0
-1, 0.65, 0, 0, 1, 0.7010, 0, 0, 1
 1, 0.55, 0, 0, 1, 0.6430, 1, 0, 0
-1, 0.25, 1, 0, 0, 0.4030, 1, 0, 0
 1, 0.46, 0, 0, 1, 0.5100, 0, 1, 0
-1, 0.36, 1, 0, 0, 0.5350, 1, 0, 0
 1, 0.52, 0, 1, 0, 0.5810, 0, 1, 0
 1, 0.61, 0, 0, 1, 0.6790, 1, 0, 0
 1, 0.57, 0, 0, 1, 0.6570, 1, 0, 0
-1, 0.46, 0, 1, 0, 0.5260, 0, 1, 0
-1, 0.62, 1, 0, 0, 0.6680, 0, 0, 1
 1, 0.55, 0, 0, 1, 0.6270, 1, 0, 0
-1, 0.22, 0, 0, 1, 0.2770, 0, 1, 0
-1, 0.50, 1, 0, 0, 0.6290, 1, 0, 0
-1, 0.32, 0, 1, 0, 0.4180, 0, 1, 0
-1, 0.21, 0, 0, 1, 0.3560, 1, 0, 0
 1, 0.44, 0, 1, 0, 0.5200, 0, 1, 0
 1, 0.46, 0, 1, 0, 0.5170, 0, 1, 0
 1, 0.62, 0, 1, 0, 0.6970, 1, 0, 0
 1, 0.57, 0, 1, 0, 0.6640, 1, 0, 0
-1, 0.67, 0, 0, 1, 0.7580, 0, 0, 1
 1, 0.29, 1, 0, 0, 0.3430, 0, 0, 1
 1, 0.53, 1, 0, 0, 0.6010, 1, 0, 0
-1, 0.44, 1, 0, 0, 0.5480, 0, 1, 0
 1, 0.46, 0, 1, 0, 0.5230, 0, 1, 0
-1, 0.20, 0, 1, 0, 0.3010, 0, 1, 0
-1, 0.38, 1, 0, 0, 0.5350, 0, 1, 0
 1, 0.50, 0, 1, 0, 0.5860, 0, 1, 0
 1, 0.33, 0, 1, 0, 0.4250, 0, 1, 0
-1, 0.33, 0, 1, 0, 0.3930, 0, 1, 0
 1, 0.26, 0, 1, 0, 0.4040, 1, 0, 0
 1, 0.58, 1, 0, 0, 0.7070, 1, 0, 0
 1, 0.43, 0, 0, 1, 0.4800, 0, 1, 0
-1, 0.46, 1, 0, 0, 0.6440, 1, 0, 0
 1, 0.60, 1, 0, 0, 0.7170, 1, 0, 0
-1, 0.42, 1, 0, 0, 0.4890, 0, 1, 0
-1, 0.56, 0, 0, 1, 0.5640, 0, 0, 1
-1, 0.62, 0, 1, 0, 0.6630, 0, 0, 1
-1, 0.50, 1, 0, 0, 0.6480, 0, 1, 0
 1, 0.47, 0, 0, 1, 0.5200, 0, 1, 0
-1, 0.67, 0, 1, 0, 0.8040, 0, 0, 1
-1, 0.40, 0, 0, 1, 0.5040, 0, 1, 0
 1, 0.42, 0, 1, 0, 0.4840, 0, 1, 0
 1, 0.64, 1, 0, 0, 0.7200, 1, 0, 0
-1, 0.47, 1, 0, 0, 0.5870, 0, 0, 1
 1, 0.45, 0, 1, 0, 0.5280, 0, 1, 0
-1, 0.25, 0, 0, 1, 0.4090, 1, 0, 0
 1, 0.38, 1, 0, 0, 0.4840, 1, 0, 0
 1, 0.55, 0, 0, 1, 0.6000, 0, 1, 0
-1, 0.44, 1, 0, 0, 0.6060, 0, 1, 0
 1, 0.33, 1, 0, 0, 0.4100, 0, 1, 0
 1, 0.34, 0, 0, 1, 0.3900, 0, 1, 0
 1, 0.27, 0, 1, 0, 0.3370, 0, 0, 1
 1, 0.32, 0, 1, 0, 0.4070, 0, 1, 0
 1, 0.42, 0, 0, 1, 0.4700, 0, 1, 0
-1, 0.24, 0, 0, 1, 0.4030, 1, 0, 0
 1, 0.42, 0, 1, 0, 0.5030, 0, 1, 0
 1, 0.25, 0, 0, 1, 0.2800, 0, 0, 1
 1, 0.51, 0, 1, 0, 0.5800, 0, 1, 0
-1, 0.55, 0, 1, 0, 0.6350, 0, 0, 1
 1, 0.44, 1, 0, 0, 0.4780, 0, 0, 1
-1, 0.18, 1, 0, 0, 0.3980, 1, 0, 0
-1, 0.67, 0, 1, 0, 0.7160, 0, 0, 1
 1, 0.45, 0, 0, 1, 0.5000, 0, 1, 0
 1, 0.48, 1, 0, 0, 0.5580, 0, 1, 0
-1, 0.25, 0, 1, 0, 0.3900, 0, 1, 0
-1, 0.67, 1, 0, 0, 0.7830, 0, 1, 0
 1, 0.37, 0, 0, 1, 0.4200, 0, 1, 0
-1, 0.32, 1, 0, 0, 0.4270, 0, 1, 0
 1, 0.48, 1, 0, 0, 0.5700, 0, 1, 0
-1, 0.66, 0, 0, 1, 0.7500, 0, 0, 1
 1, 0.61, 1, 0, 0, 0.7000, 1, 0, 0
-1, 0.58, 0, 0, 1, 0.6890, 0, 1, 0
 1, 0.19, 1, 0, 0, 0.2400, 0, 0, 1
 1, 0.38, 0, 0, 1, 0.4300, 0, 1, 0
-1, 0.27, 1, 0, 0, 0.3640, 0, 1, 0
 1, 0.42, 1, 0, 0, 0.4800, 0, 1, 0
 1, 0.60, 1, 0, 0, 0.7130, 1, 0, 0
-1, 0.27, 0, 0, 1, 0.3480, 1, 0, 0
 1, 0.29, 0, 1, 0, 0.3710, 1, 0, 0
-1, 0.43, 1, 0, 0, 0.5670, 0, 1, 0
 1, 0.48, 1, 0, 0, 0.5670, 0, 1, 0
 1, 0.27, 0, 0, 1, 0.2940, 0, 0, 1
-1, 0.44, 1, 0, 0, 0.5520, 1, 0, 0
 1, 0.23, 0, 1, 0, 0.2630, 0, 0, 1
-1, 0.36, 0, 1, 0, 0.5300, 0, 0, 1
 1, 0.64, 0, 0, 1, 0.7250, 1, 0, 0
 1, 0.29, 0, 0, 1, 0.3000, 0, 0, 1
-1, 0.33, 1, 0, 0, 0.4930, 0, 1, 0
-1, 0.66, 0, 1, 0, 0.7500, 0, 0, 1
-1, 0.21, 0, 0, 1, 0.3430, 1, 0, 0
 1, 0.27, 1, 0, 0, 0.3270, 0, 0, 1
 1, 0.29, 1, 0, 0, 0.3180, 0, 0, 1
-1, 0.31, 1, 0, 0, 0.4860, 0, 1, 0
 1, 0.36, 0, 0, 1, 0.4100, 0, 1, 0
 1, 0.49, 0, 1, 0, 0.5570, 0, 1, 0
-1, 0.28, 1, 0, 0, 0.3840, 1, 0, 0
-1, 0.43, 0, 0, 1, 0.5660, 0, 1, 0
-1, 0.46, 0, 1, 0, 0.5880, 0, 1, 0
 1, 0.57, 1, 0, 0, 0.6980, 1, 0, 0
-1, 0.52, 0, 0, 1, 0.5940, 0, 1, 0
-1, 0.31, 0, 0, 1, 0.4350, 0, 1, 0
-1, 0.55, 1, 0, 0, 0.6200, 0, 0, 1
 1, 0.50, 1, 0, 0, 0.5640, 0, 1, 0
 1, 0.48, 0, 1, 0, 0.5590, 0, 1, 0
-1, 0.22, 0, 0, 1, 0.3450, 1, 0, 0
 1, 0.59, 0, 0, 1, 0.6670, 1, 0, 0
 1, 0.34, 1, 0, 0, 0.4280, 0, 0, 1
-1, 0.64, 1, 0, 0, 0.7720, 0, 0, 1
 1, 0.29, 0, 0, 1, 0.3350, 0, 0, 1
-1, 0.34, 0, 1, 0, 0.4320, 0, 1, 0
-1, 0.61, 1, 0, 0, 0.7500, 0, 0, 1
 1, 0.64, 0, 0, 1, 0.7110, 1, 0, 0
-1, 0.29, 1, 0, 0, 0.4130, 1, 0, 0
 1, 0.63, 0, 1, 0, 0.7060, 1, 0, 0
-1, 0.29, 0, 1, 0, 0.4000, 1, 0, 0
-1, 0.51, 1, 0, 0, 0.6270, 0, 1, 0
-1, 0.24, 0, 0, 1, 0.3770, 1, 0, 0
 1, 0.48, 0, 1, 0, 0.5750, 0, 1, 0
 1, 0.18, 1, 0, 0, 0.2740, 1, 0, 0
 1, 0.18, 1, 0, 0, 0.2030, 0, 0, 1
 1, 0.33, 0, 1, 0, 0.3820, 0, 0, 1
-1, 0.20, 0, 0, 1, 0.3480, 1, 0, 0
 1, 0.29, 0, 0, 1, 0.3300, 0, 0, 1
-1, 0.44, 0, 0, 1, 0.6300, 1, 0, 0
-1, 0.65, 0, 0, 1, 0.8180, 1, 0, 0
-1, 0.56, 1, 0, 0, 0.6370, 0, 0, 1
-1, 0.52, 0, 0, 1, 0.5840, 0, 1, 0
-1, 0.29, 0, 1, 0, 0.4860, 1, 0, 0
-1, 0.47, 0, 1, 0, 0.5890, 0, 1, 0
 1, 0.68, 1, 0, 0, 0.7260, 0, 0, 1
 1, 0.31, 0, 0, 1, 0.3600, 0, 1, 0
 1, 0.61, 0, 1, 0, 0.6250, 0, 0, 1
 1, 0.19, 0, 1, 0, 0.2150, 0, 0, 1
 1, 0.38, 0, 0, 1, 0.4300, 0, 1, 0
-1, 0.26, 1, 0, 0, 0.4230, 1, 0, 0
 1, 0.61, 0, 1, 0, 0.6740, 1, 0, 0
 1, 0.40, 1, 0, 0, 0.4650, 0, 1, 0
-1, 0.49, 1, 0, 0, 0.6520, 0, 1, 0
 1, 0.56, 1, 0, 0, 0.6750, 1, 0, 0
-1, 0.48, 0, 1, 0, 0.6600, 0, 1, 0
 1, 0.52, 1, 0, 0, 0.5630, 0, 0, 1
-1, 0.18, 1, 0, 0, 0.2980, 1, 0, 0
-1, 0.56, 0, 0, 1, 0.5930, 0, 0, 1
-1, 0.52, 0, 1, 0, 0.6440, 0, 1, 0
-1, 0.18, 0, 1, 0, 0.2860, 0, 1, 0
-1, 0.58, 1, 0, 0, 0.6620, 0, 0, 1
-1, 0.39, 0, 1, 0, 0.5510, 0, 1, 0
-1, 0.46, 1, 0, 0, 0.6290, 0, 1, 0
-1, 0.40, 0, 1, 0, 0.4620, 0, 1, 0
-1, 0.60, 1, 0, 0, 0.7270, 0, 0, 1
 1, 0.36, 0, 1, 0, 0.4070, 0, 0, 1
 1, 0.44, 1, 0, 0, 0.5230, 0, 1, 0
 1, 0.28, 1, 0, 0, 0.3130, 0, 0, 1
 1, 0.54, 0, 0, 1, 0.6260, 1, 0, 0

Test data:

# people_test.txt
#
-1, 0.51, 1, 0, 0, 0.6120, 0, 1, 0
-1, 0.32, 0, 1, 0, 0.4610, 0, 1, 0
 1, 0.55, 1, 0, 0, 0.6270, 1, 0, 0
 1, 0.25, 0, 0, 1, 0.2620, 0, 0, 1
 1, 0.33, 0, 0, 1, 0.3730, 0, 0, 1
-1, 0.29, 0, 1, 0, 0.4620, 1, 0, 0
 1, 0.65, 1, 0, 0, 0.7270, 1, 0, 0
-1, 0.43, 0, 1, 0, 0.5140, 0, 1, 0
-1, 0.54, 0, 1, 0, 0.6480, 0, 0, 1
 1, 0.61, 0, 1, 0, 0.7270, 1, 0, 0
 1, 0.52, 0, 1, 0, 0.6360, 1, 0, 0
 1, 0.30, 0, 1, 0, 0.3350, 0, 0, 1
 1, 0.29, 1, 0, 0, 0.3140, 0, 0, 1
-1, 0.47, 0, 0, 1, 0.5940, 0, 1, 0
 1, 0.39, 0, 1, 0, 0.4780, 0, 1, 0
 1, 0.47, 0, 0, 1, 0.5200, 0, 1, 0
-1, 0.49, 1, 0, 0, 0.5860, 0, 1, 0
-1, 0.63, 0, 0, 1, 0.6740, 0, 0, 1
-1, 0.30, 1, 0, 0, 0.3920, 1, 0, 0
-1, 0.61, 0, 0, 1, 0.6960, 0, 0, 1
-1, 0.47, 0, 0, 1, 0.5870, 0, 1, 0
 1, 0.30, 0, 0, 1, 0.3450, 0, 0, 1
-1, 0.51, 0, 0, 1, 0.5800, 0, 1, 0
-1, 0.24, 1, 0, 0, 0.3880, 0, 1, 0
-1, 0.49, 1, 0, 0, 0.6450, 0, 1, 0
 1, 0.66, 0, 0, 1, 0.7450, 1, 0, 0
-1, 0.65, 1, 0, 0, 0.7690, 1, 0, 0
-1, 0.46, 0, 1, 0, 0.5800, 1, 0, 0
-1, 0.45, 0, 0, 1, 0.5180, 0, 1, 0
-1, 0.47, 1, 0, 0, 0.6360, 1, 0, 0
-1, 0.29, 1, 0, 0, 0.4480, 1, 0, 0
-1, 0.57, 0, 0, 1, 0.6930, 0, 0, 1
-1, 0.20, 1, 0, 0, 0.2870, 0, 0, 1
-1, 0.35, 1, 0, 0, 0.4340, 0, 1, 0
-1, 0.61, 0, 0, 1, 0.6700, 0, 0, 1
-1, 0.31, 0, 0, 1, 0.3730, 0, 1, 0
 1, 0.18, 1, 0, 0, 0.2080, 0, 0, 1
 1, 0.26, 0, 0, 1, 0.2920, 0, 0, 1
-1, 0.28, 1, 0, 0, 0.3640, 0, 0, 1
-1, 0.59, 0, 0, 1, 0.6940, 0, 0, 1