Why is a Kernel Function Denoted by Both K(x, x’) and K(x – x’)?

One of the major challenges in machine learning is dealing with wildly varying vocabulary and math notation. For example, there are many kernel functions that measure the similarity of two vectors. The most common kernel function is the radial basis function (RBF), also called the Gaussian kernel. Another common kernel function is the polynomial kernel.

In research papers and Wikipedia articles and blog posts and textbooks, the symbol k or K is used to mean an arbitrary kernel function. For example:

The equation on the top is from the Wikipedia article on Kernel Regression. Notice the K(x – x’) using subtraction to indicate some kernel function. The equation on the bottom is from the Wikipedia article on Kernel Smoother. Notice the K(X0, Xi) using commas to indicate some kernel function. Why the difference?

By the way, notice that sometimes k or K has subscripts to indicate parameters for the kernel function, And sometimes the input x vectors are lower-case and sometimes upper-case, and sometimes they have subscripts and sometimes they don’t. More confusion. Gahhh! For anyone trying to study machine learning, these inconsistencies are a huge obstacle.

In a nutshell, the generic K(x, x’) form means the kernel function accepts two different vectors, and the K(x – x’) form means the kernel function accepts a single vector which is the vector difference of the two vectors.

I’ll illustrate with the radial basis function (RBF) aka Gaussian kernel. The RBF/Gaussian kernel function returns a measure of similarity between two vectors. It returns 1.0 if two vectors are the same, or returns a value close to 1.0 if the two vectors are close, or returns a value close to 0.0 if the two vectors are very different.

Here’s a definition that uses the two-vector with comma format:

def gaussian1(x0, x1, sigma):
  # x0 and x1 are vectors
  dim = len(x0)
  sum = 0.0
  for i in range(dim):
    sum += (x0[i] - x1[i]) * (x0[i] - x1[i])
  return np.exp(-1 * sum / (2 * sigma * sigma))

Calling this version looks like:

x0 = np.array([3.7, 2.3, 4.9])
x1 = np.array([3.0, 2.0, 4.0])
sigma = 0.5
g1 = gaussian1(x0, x1, sigma)
print("gaussian1 result = %0.4f " % g1)

Because the two vectors are relatively far apart, the result is 0.0620 which is close to 0.0.

Now here’s a definition that mimics the two-vector with minus sign format:

def vec_subtract(x0, x1):
  # x0 - x1
  dim = len(x0)
  result = np.zeros(dim)
  for i in range(dim):
    result[i] = x0[i] - x1[i]
  return result

def gaussian2(x, sigma):
  # x is x0 - x1
  dim = len(x0)
  sum = 0.0
  for i in range(dim):
    sum += x[i] * x[i]
  return np.exp(-1 * sum / (2 * sigma * sigma))

Calling this function looks like:

x_diff = vec_subtract(x0, x1)
g2 = gaussian2(x_diff, sigma)
print("gaussian2 result = %0.4f " % g2)

The result is the same, 0.0620.

The vec_subtract() function really isn’t necessary because the Python language supports vector subtraction directly, as in x0 – x1.

To summarize, K(x, x’) means the kernel function definition accepts two different vectors, and K(x – x’) is a notation that means the kernel function definition accepts a single vector which is the difference between two vectors.

The moral of the story is that machine learning and software development has an enormous amount of detail and wildly inconsistent notations that require a big effort to understand — it’s not a “big idea” or “big vision” occupation (although big ideas and visions are critically important).



I’ve always been interested in kernel functions. And I’ve always been interested in 1950s science fiction movies. A surprising number of these movies have a military Colonel character.

Left: In “The Beast from 20,000 Fathoms” (1953), a frozen dinosaur is awakened by atomic testing. Actor Kenneth Tobey plays Colonel Jack Evans. An excellent movie that was the direct inspiration “Godzilla” (1954). My grade for the movie = solid A.

Center: In “Rocketship X-M” (1950), a group of five people travel to Mars. This was the first U.S. major, realistic, science fiction movie. Actor Lloyd Bridges plays Colonel Floyd Garham. My grade for the movie = solid B.

Right: In “Target Earth” (1954), Earth is invaded by robots from Venus. Actor Steve Pendleton plays an unamed Colonel as a secondary character. This movie was filmed on a very low budget but it has a certain charm. My grade for the movie = C+.


Demo program:

# gaussian_explore.py
import numpy as np

def gaussian1(x0, x1, sigma):
  # x0 and x1 are vectors
  dim = len(x0)
  sum = 0.0
  for i in range(dim):
    sum += (x0[i] - x1[i]) * (x0[i] - x1[i])
  return np.exp(-1 * sum / (2 * sigma * sigma))

def vec_subtract(x0, x1):
  # x0 - x1
  dim = len(x0)
  result = np.zeros(dim)
  for i in range(dim):
    result[i] = x0[i] - x1[i]
  return result

def gaussian2(x, sigma):
  # x is x0 - x1
  dim = len(x0)
  sum = 0.0
  for i in range(dim):
    sum += x[i] * x[i]
  return np.exp(-1 * sum / (2 * sigma * sigma))

print("\nBegin Gaussian parameter format explore ")

x0 = np.array([3.7, 2.3, 4.9])
x1 = np.array([3.0, 2.0, 4.0])
sigma = 0.5
g1 = gaussian1(x0, x1, sigma)
print("\ngaussian1 result = %0.4f " % g1)

x_diff = vec_subtract(x0, x1)
g2 = gaussian2(x_diff, sigma)
print("\ngaussian2 result = %0.4f " % g2)

print("\nEnd demo ")
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply