Using a Hugging Face Fine-Tuned Binary Classification Model

I’ve been taking a deep dive into the Hugging Face (HF) open-source code library for natural language processing (NLP) with a transformer architecture (TA) model.

In previous explorations, I fine-tuned a pretrained HF DistilBERT model (110 million parameters) to classify movie reviews as 0 (negative) or 1 (positive), and I also wrote a function to compute the classification accuracy of the tuned model.

Today I coded up a demo that uses the tuned model to predict the sentiment of an arbitrary new movie review. To do so I had to specify a review using raw text (“This was a GREAT waste of my time.”), then convert the review text to token IDs (integers like “this” = 2023), then feed the tokenized review to the tuned model and fetch the results, then interpret the results.

The pretrained model knows all about the English language, such as the words “movie” and “flick” mean the same thing when the context is cinema, but “flick” can mean a sudden sharp movement in other contexts. But the pretrained model doesn’t know anything about movie review sentiment so the pretrained model must be fine-tuned to understand things like “flop” means a bad movie.

Each of the steps was conceptually simple but had many technical details to deal with. But after a few hours of work I got a demo up and running. Many of the technical problems that I ran into caused me less trouble than expected because I’d seen many similar types of problems while working with literally hundreds of PyTorch models over the past 4 years. This is one of the main the values of experience — you can solve problems much more quickly.

It was a very interesting exploration and I can say that I have a good grasp of using a fine-tuned HF classification model. My next set of experiments will try to create an autoencoder model based on a pretrained HF model. I have no idea where to start but I’m sure I’ll figure things out . . . eventually.

Many of the biggest movie box office money losers have been science fiction films. Here are three such movies that collectively lost hundreds of millions of dollars. In my opinion, all three are OK but not quite good — they needed fine-tuning.

Left: “John Carter” (2015) lost over $200 million. The actor came across as an idiot, the actress came across as an annoying harpy, the plot was hard to follow, and the dialogue/sound was nearly impossible to understand without sub-titles.

Center: “A Sound of Thunder” (2005) lost about $100 million. The production abruptly ran out of money and the editing suffered greatly.

Right: “Valerian and the City of a Thousand Planets” (2017) lost about $100 million. Incomprehensible choice of lead actor and actress. The lead actor came across as an effeminate wimp and the actress came across as a masculine bully.

Demo code:

# imdb_hf_03_use.py
# use tuned HF model for IMDB sentiment analysis accuracy
# zipped raw data at:
# https://ai.stanford.edu/~amaas/data/sentiment/

import numpy as np  # not used
from transformers import DistilBertTokenizerFast
import torch
from transformers import DistilBertForSequenceClassification
from transformers import logging  # to suppress warnings

device = torch.device('cpu')

def main():
  # 0. get ready
  print("\nBegin use IMDB HF model demo ")
  logging.set_verbosity_error()  # suppress wordy warnings
  torch.manual_seed(1)
  np.random.seed(1)

  # 1. load pretrained model
  print("\nLoading untuned DistilBERT model ")
  model = \
    DistilBertForSequenceClassification.from_pretrained( \
    'distilbert-base-uncased')
  model.to(device)
  print("Done ")

  # 2. load tuned model wts and biases
  print("\nLoading tuned model wts and biases ")
  model.load_state_dict(torch.load(\
    ".\\Models\\imdb_state.pt"))
  model.eval()
  print("Done ")

  # 3. set up input review
  review_text = ["This was a GREAT waste of my time."]
  print("\nreview_text = ")
  print(review_text)

  tokenizer = \
    DistilBertTokenizerFast.from_pretrained(\
    'distilbert-base-uncased')
  review_tokenized = \
    tokenizer(review_text, truncation=True, padding=True)
  
  print("\nreview_tokenized = ")
  print(review_tokenized)
  # {'input_ids': [[101, 2023, 2001, 1037, 2307, 5949,
  #    1997, 2026, 2051, 1012, 102]],
  #  'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

  input_ids = review_tokenized['input_ids']
  print("\nTokens: ")
  for id in input_ids[0]:
    tok = tokenizer.decode(id)
    print("%6d %s " % (id, tok))

  input_ids = torch.tensor(input_ids).to(device)
  mask = torch.tensor(review_tokenized['attention_mask']).\
    to(device)
  dummy_label = torch.tensor([0]).to(device)

  # 4. feed review to model, fetch result
  with torch.no_grad():
    outputs = model(input_ids, \
      attention_mask=mask, labels=dummy_label)
  print("\noutputs = ")
  print(outputs)
  # SequenceClassifierOutput(
  # loss=tensor(0.1055),
  # logits=tensor([[ 0.9256, -1.2700]]),
  # hidden_states=None,
  # attentions=None)

  # 5. interpret result
  logits = outputs[1]
  print("\nlogits = ")
  print(logits)

  pred_class = torch.argmax(logits, dim=1)
  print("\npred_class = ")
  print(pred_class)

  print("\nEnd demo ")

if __name__ == "__main__":
  main()