Logistic Regression using Raw C#

It had been a while since I exercised my C# language skills, so I decided to refactor a Python example of logisitic regression. The goal is to predict the sex of a person (male = 0, female = 1) from age, state (Michigan, Nebraska, Oklahoma), income, and political type (conservative, moderate, liberal).

I expected the refactoring of my Python program to C# to take just a few hours, but the process took much longer than I expected. The source of the time issue was that Python has all kinds of built-in functions, especially for reading data from file, but with C# many of the helper functions have to be implemented from scratch.

After I finished my C# version, the results were nearly identical to my Python version. My original intention was to explain my C# demo program in this blog post, but the program is surprisingly long and surprisingly complex. So I’ll just present the C# source code and explain it in chunks over several blog posts.

My fisrt challenge was to write code to read training and test data into memory. In Python, this can be done using the NumPy loadtxt() function combined with array slicing syntax. The data for my demo looks like:

1   0.24   1   0   0   0.2950   0   0   1
0   0.39   0   0   1   0.5120   0   1   0
1   0.63   0   1   0   0.7580   1   0   0
0   0.36   1   0   0   0.4450   0   1   0
1   0.27   0   1   0   0.2860   0   0   1
1   0.50   0   1   0   0.5650   0   1   0
. . .

Each data item is a person. The fields are sex (male = 0, female = 1), age (divided by 100), state (Michigan = 100, Nebraska = 010, Oklahoma – 001), income (divided by $100,000) and political leaning (conservative = 100, moderate = 010, liberal = 001). After a few hours of work, I had written C# functions to read data from file, store it into a matrix, and extract the predictors and the varaible to predict.

The data is synthetic. There are 200 training items and 40 test items. The data can be found it: https://jamesmccaffreyblog.com/2022/09/23/binary-classification-using-pytorch-1-12-1-on-windows-10-11/ where I id a binary classification using PyTorch.



For no particular reason, I did an Internet image search for “sci fi gender art”. The results were pretty variable and crazy (quite a few of them creepy), but here are three illustrations by anonymous artists that seem pretty nice. I suspect that artists constantly practice new styles in much the same way that machine learning engineers constantly upgrade their skills.


Demo code. Replace “lt”, “gt”, “lte”, “gte”, with Boolean operator symbols.

using System;
using System.IO;
// .NET (Core) 6

namespace PeopleLogisticRegression
{
  internal class PeopleLogRegProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nLogistic regression raw C# demo ");

      // 0. get ready
      Random rnd = new Random(0);

      // 1. load train data
      Console.WriteLine("\nLoading People data ");
      string fn = "..\\..\\..\\Data\\people_train.txt";
      int[] cols = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 };
      double[][] allTrain = MatLoad(fn, 200, cols, '\t');
      int[] xCols = new int[] { 1, 2, 3, 4, 5, 6, 7, 8 };
      double[][] xTrain = Extract(allTrain, xCols );
      int[] yCol = new int[] { 0 };
      double[][] yTrain = Extract(allTrain, yCol);

      // 1b. load test data
      fn = "..\\..\\..\\Data\\people_test.txt";
      cols = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 };
      double[][] allTest = MatLoad(fn, 40, cols, '\t');
      xCols = new int[] { 1, 2, 3, 4, 5, 6, 7, 8 };
      double[][] xTest = Extract(allTest, xCols);
      yCol = new int[] { 0 };
      double[][] yTest = Extract(allTest, yCol);

      // 2. create model
      Console.WriteLine("\nCreating logistic regression model ");
      double[] wts = new double[8];
      double lo = -0.01; double hi = 0.01;
      for (int i = 0; i "lt" wts.Length; ++i)
        wts[i] = (hi - lo) * rnd.NextDouble() + lo;
      double bias = 0.0;

      // 3. train model
      double lrnRate = 0.005;
      int maxEpochs = 10000;
      Console.WriteLine("\nTraining using SGD with lrnRate = " +
        lrnRate);
      int[] indices = new int[200];
      for (int i = 0; i "lt" indices.Length; ++i)
        indices[i] = i;
      for (int epoch = 0; epoch "lt" maxEpochs; ++epoch)
      {
        Shuffle(indices, rnd);
        for (int ii = 0; ii "lt" indices.Length; ++ii)
        {
          int i = indices[ii];
          double[] x = xTrain[i];
          double y = yTrain[i][0];  // target 0 or 1
          double p = ComputeOutput(wts, bias, x);
          // Console.WriteLine(p);
          //Console.ReadLine();

          // update all wts and the bias
          for (int j = 0; j "lt" wts.Length; ++j)
            wts[j] += lrnRate * x[j] * (y - p);
          bias += lrnRate * (y - p);
        }


        if (epoch % 1000 == 0)
        {
          double loss = MSELoss(wts, bias, xTrain, yTrain);
          Console.WriteLine("epoch = " + 
            epoch.ToString().PadLeft(4) + " | " +
            " loss = " + loss.ToString("F4"));
        }
      } // for
      Console.WriteLine("Done ");

      // 4. evaluate model 
      Console.WriteLine("\nEvaluating trained model ");
      double accTrain = Accuracy(wts, bias, xTrain, yTrain);
      Console.WriteLine("Accuracy on train data: " + 
        accTrain.ToString("F4"));
      double accTest = Accuracy(wts, bias, xTest, yTest);
      Console.WriteLine("Accuracy on test data: " + 
        accTest.ToString("F4"));

      // 5. use model
      Console.WriteLine("\n[33, Nebraska, $50,000, moderate]: ");
      double[] inpt = 
        new double[] { 0.33, 0, 1, 0, 0.50000, 0, 1, 0 };
      double pVal = ComputeOutput(wts, bias, inpt);
      Console.WriteLine("p-val = " + pVal.ToString("F4"));
      if (pVal "lt" 0.5)
        Console.WriteLine("class 0 (male) ");
      else
        Console.WriteLine("class 1 (female) ");

      // 6.TODO: save wts and bias to file

      Console.WriteLine("\nEnd logistic regression C# demo ");
      Console.ReadLine();
    } // Main

    public static void Shuffle(int[] arr, Random rnd)
    {
      // Fisher-Yates algorithm
      int n = arr.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        int ri = rnd.Next(i, n);  // random index
        int tmp = arr[ri];
        arr[ri] = arr[i];
        arr[i] = tmp;
      }
    }

    public static double ComputeOutput(double[] w, 
      double b, double[] x)
    {
      double z = 0.0;
      for (int i = 0; i "lt" w.Length; ++i)
      {
        z += w[i] * x[i];
      }
      z += b;
      double p = 1.0 / (1.0 + Math.Exp(-z));
      return p;
    }

    public static double Accuracy(double[] w, double b,
      double[][] dataX, double[][] dataY)
    {
      int nCorrect = 0; int nWrong = 0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double[] x = dataX[i];
        int y = (int)dataY[i][0];
        double p = ComputeOutput(w, b, x);
        if ((y == 0 "and" p "lt" 0.5) || 
          (y == 1 "and" p "gte" 0.5))
          nCorrect += 1;
        else
          nWrong += 1;
      }
      double acc = (nCorrect * 1.0) / (nCorrect + nWrong);
      return acc;
    }

    public static double MSELoss(double[] w, double b, 
      double[][] dataX, double[][] dataY)
    {
      double sum = 0.0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double[] x = dataX[i];
        double y = dataY[i][0];
        double p = ComputeOutput(w, b, x);
        sum += (y - p) * (y - p);
      }
      double mse = sum / dataX.Length;
      return mse;
    }

    public static double[][] MatLoad(string fn, int nRows,
      int[] cols, char sep)
    {
      int nCols = cols.Length;
      double[][] result = MatCreate(nRows, nCols);
      string line = "";
      string[] tokens = null;
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith("#") == true)
          continue;
        tokens = line.Split(sep);
        for (int j = 0; j "lt" nCols; ++j)
        {
          int k = cols[j];  // into tokens
          result[i][j] = double.Parse(tokens[k]);
        }
        ++i;
      }
      sr.Close(); ifs.Close();
      return result;
    }

    public static double[][] MatCreate(int rows, int cols)
    {
      double[][] result = new double[rows][];
      for (int i = 0; i "lt" rows; ++i)
        result[i] = new double[cols];
      return result;
    }

    public static void MatShow(double[][] mat, int dec,
      int wid)
    {
      int nRows = mat.Length;
      int nCols = mat[0].Length;
      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          double x = mat[i][j];
          Console.Write(x.ToString("F" + dec).PadLeft(wid));
        }
        Console.WriteLine("");
        // Utils.VecShow(mat[i], dec, wid);
      }
    }

    public static double[][] Extract(double[][] mat, 
      int[] cols)
    {
      int nRows = mat.Length;
      int nCols = cols.Length;
      double[][] result = MatCreate(nRows, nCols);
      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)  // idx into src cols
        {
          int srcCol = cols[j];
          int destCol = j;
          result[i][destCol] = mat[i][srcCol];
        }
      }
      return result;
    }

  } // Program
} // ns
This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to Logistic Regression using Raw C#

  1. Pingback: Kernel Ridge Regression From Scratch Using C# | James D. McCaffrey

Comments are closed.