One-Shot Learning, Few-Shot Learning, Zero-Shot Learning, and Fine-Tuning

The terms one-shot learning, few-shot learning, zero-shot learning, and fine-tuning don’t have universally agreed-upon definitions. All four terms are kinds of “transfer learning” where the goal is to start with an existing model and use it on a new problem. That said, here’s a set of four brief, incomplete, but reasonable, explanations.

One-Shot Learning – Usually applies to image classification. The goal is to classify an image when you only have one example of each class. For example, a bank may have just one example of each customer’s signature and a new signature on a receipt must be classified as authentic or fraudulent. One common technique is to use what’s called Siamese network architecture.

Few-Shot Learning – Usually applies to image classification. The goal is to classify an image when you only have a few (perhaps a dozen or so) examples of each class of labeled data. For example, you have a trained model that can classify hundreds of different classes of animals, but your system acquires just five images of a new previously unseen type of animal, such as an axolotl. One of many possible techniques is to generate hundreds of variations of the five new images (by stretching, horizontally inverting, etc.), add them to the original training data, and then retrain the model. A more sophisticated approach for few-shot learning is called “model agnostic meta-learning” (MAML), where a base model is trained specifically so that new classes can be trained quickly and easily.

Zero-Shot Learning – Often applies to image classification. The goal is to classify an image using an existing model that has never been trained on the image. For example, you have a trained model that can classify hundreds of different classes of automobiles but you need it to classify a new previously unseen example, such as a Chrysler PT Cruiser. There’s no way to create new knowledge from nothing, so one common technique for zero-shot learning is kind of a cheat: you incorporate an auxiliary set of data images along with text information such as “the 2001 PT Cruiser has a boxy look that resembles the 2006 Chevrolet HHR.” The auxiliary information acts as a different kind of labeling.

Fine-Tuning – Often applies to language models. The goal is to start with a pre-trained model such as GPT-3 that understands basic English and knows general facts from Wikipedia, and add specialized information such as the human resources policies of a company.

A not-so-good strategy is to prepare new specialized training data then completely retrain the base model, changing all of the billions of parameters/weights. A better strategy is to prepare new specialized training data and then train in a way that changes only some of the last layers parameters/weights, or even better, use the specialized training data to create a new layer (or a few layers) that can be appended to the base model. This last approach allows you to have just one base model, and many relatively small modules for specialized tasks.

Fine-tuning is one of the most active areas of AI research. A relatively new development is called “continual agent learning”, where a software agent/module that somehow learns-to-learn over its lifetime, such that it can eventually learn very quickly from its own experience.



I can fine-tune a machine learning model. But when I played guitar in a band, I had to use a tuning device instead of manual tuning.

Left: This amazing animation is from animusic.com and it shows an imaginary multi-stringed six-in-one instrument being played by automated wooden fingers. Wonderful! This screenshot doesn’t do it justice.

Right: Here’s an old photo of me playing with my band at the Disneyland Employees Summer Party, called the Banana Ball (because it originated with Jungle Cruise employees). I’m on the far right playing bass guitar. On the far left on rhythm guitar is my good friend Paul Ruiz. In the back on drums is my good friend Jeff Rhoads. The singer is George Trullinger, who went on to a long successful career as a professional singer in Las Vegas.


This entry was posted in Machine Learning. Bookmark the permalink.