Kernel Logistic Regression

The most fundamental machine learning technique is logistic regression (LR) binary classification. The goal of basic LR binary classification is to predict the value of a variable that can be one of just two possible discrete values. An example is predicting the sex of a person (0 = male, 1 = male) based on predictors/features such as age, annual income, political leaning (1 0 0 = conservative, 0 1 0 = moderate, 0 0 1 = liberal), years of education and so on.

Basic LR binary classification is relatively simple, and the results are somewhat interpretable (as opposed to some other machine learning classification techniques, notably neural networks). But basic LR binary classification has two key weaknesses. First, the technique only works well when the training data is simple, meaning mostly linearly separable. (Note: Weirdly, LR binary classification can perform extremely poorly when the training data is completely linearly separable — the reasons are quite mathematically complicated). Second, basic LR binary classification only works when you want to predict a binary result, as opposed to multi-class classification where you want to predict a variable that can take three or more discrete values.

There are two main extensions of basic LR to deal with the two LR weaknesses. The first variation is kernel logistic regression, which allows LR binary classification to deal with complex data that is not linearly separable, The second extension is multiclass logistic regression, which allows LR to deal with predicting a variable that can be three or more discrete values.

I haven’t looked at kernel logistic regression in quite a while so I thought it’d be fun to code up a demo to refresh my memory. I decided to use raw (no libraries) C# but the technique can be used with any programming language.

First, I created some dummy training data shown in the graph below. There are 21 training items. There are two predictor variables, which you can think of as a person’s age and weight. The goal is to predict a class which you can think of as male = 0 or female = 1. The data isn’t linearly separable so basic LR won’t work very well.

Kernel logistic regression is much more complicated than basic LR. For each of the 21 training items, a weight (often called an alpha value) is computed. Then to make a prediction for a new item, you compare against each training item, compute the sum of the corresponding alpha times a kernel function of the item to predict and each training item. Then you add the bias-alpha to the sum, the take the logistic sigmoid of the sum.

The result will be a p-value between 0.0 and 1.0. If the p-value is less than 0.5 the prediction is class 0, otherwise the prediction is class 1. There are many possible kernel functions and each possibility has one or more parameters — the choice of kernel and its parameters are hyperparameters that must be determined by trial and error.

static double ComputeOutput(double[] x, double[] alphas,
 double sigma, double[][] trainX)
{
  // x is item to predict
  // bias is last cell of alphas[]
  int n = trainX.Length;  // number items
  double sum = 0.0;
  for (int i = 0; i less-than n; ++i)
    sum += (alphas[i] * Kernel(x, trainX[i], sigma));
  sum += alphas[n];  // add the bias
  return LogSig(sum);  // result is [0.0, 1.0]
}

Anyway, good fun. When I get some free time I’ll tidy up my code and write up an explanation and publish on the online Visual Studio Magazine — my resource of choice for information about machine learning using Microsoft technologies.

Three images from an Internet search for “kernel art”. Left: A carved olive pit (kernel/seed). Center: Intricate cut paper sculpture. Right: I have no idea why this illustration is related to “kernel”, but it’s interesting.