Annoying Vocabulary Issue of the Day: Logits and Logits

One of the big challenges for anyone learning about machine learning is the wildly inconsistent vocabulary usage. For example, in mathematics, if you have a value x between 0.0 and 1.0 then the logit(x) is ln( x / (1-x) ). For example, if x = 0.2 then logit(x) = ln(0.2/0.8) = ln(0.25) = -1.39 or if x = 0.6 then logit(x) = ln(0.6/0.4) = ln(1.5) = 0.41.

But in machine learning, somewhere along the line, ML people stole the term “logits” and use it to mean a vector of arbitrary values — before being coerced so that they sum to 1.0 by using softmax. For example, in ML, a set of logits could be v = (1.5, 0.5, 2.0). After applying softmax to v, the result is (0.68, 0.25, 0.07).

Argh! This vocabulary chaos is annoying, but worse, it can lead to errors. For example, TensorFlow has a function named something like softmax_cross_entropy_with_logits() and if you assumed the math definition of logit, you’d almost certainly use the TensorFlow function incorrectly.

The moral of the story is that having a vocabulary set in ML that is consistent would be nice, but I don’t think it’ll happen anytime soon. You just have to be very careful and check and double-check ML vocabulary terms.

Computer-generated art by Frieder Nake, one of the early pioneers of computer art.