The Difference Between Encoding, Embedding, and Latent Representation - in My World

Bottom line: In the machine learning projects I work on, an encoding converts categorical data to numeric data (example: one-hot encoding where “red” = [0 1 0 0]), an embedding converts an integer word ID to a vector (ex: “the” = 4 = [-0.1234, 1.9876, . . . 3.4681]), and a latent representation is a vector that represents a condensed version of a data item (ex: an autoencoder represents [“male”, 28, $63,000.00] as [2.3456, -0.7654, . . 0.9753]).

There are no completely standard terminology guides for machine learning. Each project, research paper, and blog post should explain what each term means. The terms “encoding”, “embedding”, and “latent representation” can be, and often are, used interchangeably.

In my world — meaning the projects I work on — my colleagues and I usually try to use the meanings I presented in the first paragraph of this blog post.

The most general term is “latent representation”. The five main unsupervised neural architectures that create a latent representation are 1.) ordinary autoencoder (AE), 2.) variational autoencoder (VAE), 3.) generative adversarial network (GAN), 4.) transformer architecture encoder, and 5.) contrastive loss network. But there are dozens of other architectures for latent representations and each of the five architectures I mentioned has dozens and dozens of variations. For example, the latent representation of an AE is a simple vector but the latent representation of a VAE is a pair of vectors that represent the mean and log-variance of the source dataset.

Terminology can help communicate ideas. But language is ambiguous and so it’s important to clearly define what is meant in any particular context.

A few years ago, neural networks were just one topic in machine learning, which was just one topic in computer science. I think maybe the moral of this blog post is that the topic of deep neural architecture is now so complex that it has become a separate field of study on par with topics such as mathematics, biochemistry, and physics. Put another way, I suspect colleges and universities will eventually offer a dedicated Bachelor’s degree in Machine Learning or Artificial Intelligence.

Three photos from a stock image search for “college classroom”. Left: This photo is baffling in so many ways but I especially like the mysterious vintage light bulbs. Center: A truly masterful compostion of fruit, math, and non-optimal running shoes. Right: I doubt that plant DNA has ever been explained more clearly.