Deep neural transformer architecture (TA) systems have revolutionized the field of natural language processing (NLP). Unfortunately, TA systems are incredibly complex and implementing such a system from scratch can take months.
Enter the Hugging Face code library. Terrible name, excellent code library.
I’ve been wading through the Hugging Face (HF) documentation examples. I take an example and then refactor it completely. Doing so forces me to understand every line of code. Over time, by repeating this process for many examples, I expect to gain a solid grasp of the HF library.
My latest code refactorization was for a fill-in-the-blank example. I started with a sentence from Wikipedia:
“Machine learning (ML) is the study of computer
algorithms that can learn automatically through experience
and by the use of data.”
I erased to word “learn” to see if the demo program could find reasonable words to fill in the blank:
“Machine learning (ML) is the study of computer
algorithms that can (BLANK) automatically through experience
and by the use of data.”
To cut to the chase, the top five predicted words and their associated pseudo-probabilities were:
learn (0.3484) evolve (0.1901) operate (0.0978) work (0.0247) communicate (0.0224)
Quite impressive.
Even though the documentation code was only about 20 lines, the code was extremely dense and it took me several hours of experimentation to get to the point where I felt I understood most of the key ideas.

Artists have to fill in the blank when the blank is an art canvas. Left: By Andre E. Marty (1882-1974). Center: By Georges Lepape (1887-1971). Right: By Rene Gruau (1909-2004). All three men lived through the beginning of flight to men landing on the moon. Amazing.
Code below.
# fill_blank_test.py
# refactored from Hug Face documentation example
import numpy as np
import torch as T
from transformers import AutoModelForMaskedLM, AutoTokenizer
print("\nBegin fill-in-the-blank using TA ")
print("\nLoading (cached) DistilBERT language model into memory ")
toker = \
AutoTokenizer.from_pretrained("distilbert-base-cased")
model = \
AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")
sentence = "Machine learning (ML) is the study of computer \
algorithms that can (BLANK) automatically through experience \
and by the use of data."
print("\nThe target fill-in-the-blank sentence is: ")
print(sentence)
print("\nThe actual (BLANK) word from Wikipedia is \"learn\" ")
sentence = f"Machine learning (ML) is the study of computer \
algorithms that can {toker.mask_token} automatically through \
experience and by the use of data."
print("\nConverting sentence to token IDs ")
inpts = toker(sentence, return_tensors="pt")
# inpts["input_ids"]
# tensor([[ 101, 7792, 3776, 113, 150,
# 2162, 114, 1110, 1103, 2025,
# 1104, 2775, 14975, 1115, 1169,
# 103, 7743, 1194, 2541, 1105,
# 1118, 1103, 1329, 1104, 2233,
# 119, 102]])
# for i in range(27):
# print(inpts["input_ids"][0][i])
print("\nComputing output for all 28,996 possibilities ")
blank_id = toker.mask_token_id # ID of blank = 103
blank_id_idx = T.where(inpts["input_ids"] == blank_id)[1] # 15
with T.no_grad():
all_logits = model(**inpts).logits # 3D
pred_logits = all_logits[0, blank_id_idx, :] # [1, 28996]
print("\nExtracting IDs of top five predicted words: ")
top_ids = T.topk(pred_logits, 5, dim=1).indices[0].tolist()
print(top_ids)
print("\nThe top five predicteds as words: ")
for id in top_ids:
print(toker.decode([id]))
print("\nConverting raw logit outputs to probabilities ")
np.set_printoptions(precision=4, suppress=True)
pred_probs = T.softmax(pred_logits, dim=1).numpy()
pred_probs = np.sort(pred_probs[0])[::-1] # high p to low p
top_probs = pred_probs[0:5]
print("\nThe top five corresponding probabilities: ")
print(top_probs)
# [0.3484 0.1901 0.0978 0.0247 0.0224]
print("\nEnd fill-in-the-blank demo ")

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.