The Jacobian and Machine Learning

I’m sometimes asked a question along the lines of, “I’m relatively new to machine learning. What math do I need to know to learn ML?”

Some of my colleagues answer this question by listing large categories of topics, such as “linear algebra” but I tend to think of the math skills needed for ML in more granular terms. One of the must-know math topics for ML, in my opinion, is the Jacobian.

The Jacobian of a set of functions is a matrix of partial derivatives of the functions. If you have just one function instead of a set of function, the Jacobian is the gradient of the function.

The idea is best explained by example.

Here I have two functions (f1, f2) and there are three shared variables (x1 through x3) and eight separate constants (b0 through b7). The top row of the Jacobian is the set of partial derivatives of f1 with respect to it variables. The bottom row is the set of partial derivatives of the second function. I made functions f1 and f2 concrete and very simple, but the Jacobian can work with any number of functions, and any number of variables, shared or not (although in ML scenarios the variables are usually shared).

Notice that if there is just one function, say f1, then the Jacobian will have just one row of partial derivatives, and so is the gradient of the function.

The partial derivative of a function is one of the most important and common math concepts in ML. Among other things, a partial derivative gives you the direction (the sign of the gradient) and magnitude (magnitude of gradient) of how to adjust constants (the bi values in my example) in order to minimize some sort of error.