I Devise a Novel Binary Classification Technique - Tanh Regression

I Devise a Novel Binary Classification Technique – Tanh Regression

Last week, while I was walking my dogs, I was thinking about logistic regression (LR). LR is a technique to perform binary classification, such as predicting if a person is male or female based on their age, job type (mgmt, sale, tech), annual income, and job satisfaction (low, medium, high).

The key equation for LR looks like:

let z = (0.23)(age) + (-0.98)(type) + (1.37)(income) +
  (0.55)(satis) + 1.69
let p = 1.0 / (1.0 * exp(-z))

The p value will between 0.0 and 1.0 and a result less than 0.5 means IsMale = 0 = false, and a p value greater than 0.5 means IsMale = 1 = true. The constants like 0.23 above are called weights. The trailing constant like 1.69 is called the bias. To find the values for the weights and the bias, you use a set of training data with known correct output values, along with one of many optimization algorithms. The equation for p is called the logistic sigmoid function.

As it turns out, mathematically LR is a simplified form of a neural network with one hidden node and one output node. Quite remarkable. See my post at https://jamesmccaffreyblog.com/2018/07/07/why-a-neural-network-is-always-better-than-logistic-regression/.

There is a close math relationship between logistic sigmoid and tanh: logsig(z) = 1.0 / (1.0 + exp(-z)) = 1/2 + 1/2(tanh(z/2)).

For neural networks with a single hidden layer, in the early days it was standard to use logistic sigmoid activation. But it was soon discovered that using tanh (hyperbolic tangent) activation usually gives better results. Putting these ideas together, I wondered if it’d be possible to use tanh instead of LR as the basis for a simple, non-neural binary classification system.

So I coded up a demo one morning and viola, my idea of tanh regression for binary classification worked very well.

There were quite a few minor details to take care of. Instead of encoding the value to predict as 0 or 1, it was convenient to encode as -1 or +1 because tanh returns a value between (-1.0, +1.0). The training algorithm changed to use the derivative of the tanh which is (1 – output) * (1 + output). And the accuracy and error functions changed slightly too.

I searched the Internet trying to find any information about this technique of tanh regression for binary classification but found nothing. I suspect that most of my machine learning colleagues accept standard techniques, such as logistic regression, as dogma and don’t question if ML techniques, in spite of the fact that they’re decades old, can be improved or modified.

Three posters by artist Jules Cheret (1836-1932). Cheret’s best-known works are in the style of art nouveau, which was very popular from 1890 to 1910. I admire artists who question dogma and try to create new styles.

This entry was posted in Machine Learning. Bookmark the permalink.

2 Responses to I Devise a Novel Binary Classification Technique – Tanh Regression

~ (@onlyvixx) says:

October 11, 2019 at 7:12 pm

Hello Mr McCaffrey, can you please give some intuition why there would be a significant difference in performance between tanh and logsig if the two functions are so “closely related” – pretty much the same except for the scale?

Loading...
- jamesdmccaffrey says:
  
  October 12, 2019 at 8:26 am
  
  Well, just like with the hidden layer of a neural network, the difference between using logistic sigmoid and tanh varies quite a bit — sometimes tanh is better but sometimes not. In my little experiments with tanh regression binary classification, the same seemed to be true. So saying there’s a significant difference in performance between logsig and tanh is a bit too strong — probably more accurate is something like, “Substituting tanh activation for logistic sigmoid activation in logistic regression classification sometimes gives good results.” The intuition is that tanh has a wider function range, [-1.0, +1.0], and logsig [0.0, 1.0] so there’s more wiggle room between computed output and target output. See also https://stats.stackexchange.com/questions/101560/tanh-activation-function-vs-sigmoid-activation-function. Finally, I did several experiments with activation functions for neural networks at https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx.
  
  Loading...