Last week, while I was walking my dogs, I was thinking about logistic regression (LR). LR is a technique to perform binary classification, such as predicting if a person is male or female based on their age, job type (mgmt, sale, tech), annual income, and job satisfaction (low, medium, high).
The key equation for LR looks like:
let z = (0.23)(age) + (-0.98)(type) + (1.37)(income) + (0.55)(satis) + 1.69 let p = 1.0 / (1.0 * exp(-z))
The p value will between 0.0 and 1.0 and a result less than 0.5 means IsMale = 0 = false, and a p value greater than 0.5 means IsMale = 1 = true. The constants like 0.23 above are called weights. The trailing constant like 1.69 is called the bias. To find the values for the weights and the bias, you use a set of training data with known correct output values, along with one of many optimization algorithms. The equation for p is called the logistic sigmoid function.
As it turns out, mathematically LR is a simplified form of a neural network with one hidden node and one output node. Quite remarkable. See my post at https://jamesmccaffreyblog.com/2018/07/07/why-a-neural-network-is-always-better-than-logistic-regression/.
There is a close math relationship between logistic sigmoid and tanh: logsig(z) = 1.0 / (1.0 + exp(-z)) = 1/2 + 1/2(tanh(z/2)).
For neural networks with a single hidden layer, in the early days it was standard to use logistic sigmoid activation. But it was soon discovered that using tanh (hyperbolic tangent) activation usually gives better results. Putting these ideas together, I wondered if it’d be possible to use tanh instead of LR as the basis for a simple, non-neural binary classification system.
So I coded up a demo one morning and viola, my idea of tanh regression for binary classification worked very well.
There were quite a few minor details to take care of. Instead of encoding the value to predict as 0 or 1, it was convenient to encode as -1 or +1 because tanh returns a value between (-1.0, +1.0). The training algorithm changed to use the derivative of the tanh which is (1 – output) * (1 + output). And the accuracy and error functions changed slightly too.
I searched the Internet trying to find any information about this technique of tanh regression for binary classification but found nothing. I suspect that most of my machine learning colleagues accept standard techniques, such as logistic regression, as dogma and don’t question if ML techniques, in spite of the fact that they’re decades old, can be improved or modified.

Three posters by artist Jules Cheret (1836-1932). Cheret’s best-known works are in the style of art nouveau, which was very popular from 1890 to 1910. I admire artists who question dogma and try to create new styles.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
Hello Mr McCaffrey, can you please give some intuition why there would be a significant difference in performance between tanh and logsig if the two functions are so “closely related” – pretty much the same except for the scale?
Well, just like with the hidden layer of a neural network, the difference between using logistic sigmoid and tanh varies quite a bit — sometimes tanh is better but sometimes not. In my little experiments with tanh regression binary classification, the same seemed to be true. So saying there’s a significant difference in performance between logsig and tanh is a bit too strong — probably more accurate is something like, “Substituting tanh activation for logistic sigmoid activation in logistic regression classification sometimes gives good results.” The intuition is that tanh has a wider function range, [-1.0, +1.0], and logsig [0.0, 1.0] so there’s more wiggle room between computed output and target output. See also https://stats.stackexchange.com/questions/101560/tanh-activation-function-vs-sigmoid-activation-function. Finally, I did several experiments with activation functions for neural networks at https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx.