Example of Fine-Tuning a Text Classification Language Model Using the HuggingFace Libraries

The goal of a text classification model is to predict a single integer value from text input. One example is predicting the sentiment (0 = negative, 1 = positive) of a movie review (“This Disney movie is woke garbage”). Another example is predicting the rating (1-5) of a restaurant based on a user summary (“El Dumpo has great tacos”).

Creating a text classification model using raw PyTorch code with a TransformerEncoder module is possible but very difficult. The HuggingFace libraries are wrappers over PyTorch code, and are significantly easier to use than raw PyTorch.

I created a demo for the Twitter Financial News Topic dataset. The goal is to classify a finance-related Twitter message where there are 20 classes:

"LABEL_0": "Analyst Update",
"LABEL_1": "Fed | Central Banks",
"LABEL_2": "Company | Product News",
"LABEL_3": "Treasuries | Corporate Debt",
"LABEL_4": "Dividend",
"LABEL_5": "Earnings",
"LABEL_6": "Energy | Oil",
"LABEL_7": "Financials",
"LABEL_8": "Currencies",
"LABEL_9": "General News | Opinion",
"LABEL_10": "Gold | Metals | Materials",
"LABEL_11": "IPO",
"LABEL_12": "Legal | Regulation",
"LABEL_13": "M&A | Investments",
"LABEL_14": "Macro",
"LABEL_15": "Markets",
"LABEL_16": "Politics",
"LABEL_17": "Personnel Change",
"LABEL_18": "Stock Commentary",
"LABEL_19": "Stock Movement",

For example, the message “The ACME Corp. announced a new CEO today” would likely be classified as LABEL_17: Personnel Change. The dataset has 16,990 training messages and 4,118 test/validation messages.

Briefly, I started with a base BERT general language model, used it to create a base BERT classification20 model, and then trained/fine-tuned the classification20 model using the Twitter training data.

To use the HuggingFace libraries and datasets, I needed a machine that had Python and PyTorch installed. Then I had to install four key HuggingFace libraries via the command:

pip install transformers datasets evaluate accelerate

Preparing the training and test data by tokenizing it was non-trivial. First, I selected a tiny subset (50 training, 10 test) of the data, and then tokenized it using these stsatements:

import numpy as np
import evaluate
from datasets import load_dataset
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
from transformers import Trainer

tiny_train_dataset = \
  datasets["train"].shuffle(seed=7).select(range(50))
tiny_test_dataset = \
  datasets["validation"].shuffle(seed=7).select(range(10))

def tokenize_function(examples):
  return the_tokenizer(examples["text"], \
    padding="max_length", truncation=True)

tiny_train_tokenized_dataset = \
  tiny_train_dataset.map(tokenize_function, batched=True)
tiny_test_tokenized_dataset = \
  tiny_test_dataset.map(tokenize_function, batched=True)
print("Done ")

The HuggingFace libraries contain a Trainer object. I set the number of training epochs to an artificially low value of 2, just to experiment, because training is very slow.

metric = evaluate.load("accuracy")
def metrics_function(eval_pred):
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=-1)
  return metric.compute(predictions=predictions, \
    references=labels)

training_args = TrainingArguments(output_dir="test_trainer",
  num_train_epochs=2, evaluation_strategy="epoch")
trainer = Trainer(
  model=the_model,
  args=training_args,
  train_dataset=tiny_train_tokenized_dataset,
  eval_dataset=tiny_test_tokenized_dataset,
  compute_metrics=metrics_function
)

print("Start training/fine-tuning base classifier model ")
trainer.train()
print("Done ")

The training progress messages show 0.00 accuracy on the test data, which indicates that I didn’t use nearly enough training data or enough training epochs.

I used the trained model to make a prediction:

classifier = pipeline("text-classification", model=the_model,
  tokenizer=the_tokenizer)
print("Using fine-tuned model to classify: " + \
  "Acme Corp announces new CEO")
result = classifier("Acme Corp announces new CEO")
print("Result: ")
print(result) 

# Result:
# [{'label': 'LABEL_2', 'score': 0.17667075991630554}]

The prediction of LABEL_2 = “Company | Product News” was incorrect but plausible.

Interesting stuff.

AI generated art is quite amazing. But I don’t think that AI generated art can create illustrations that are as good as those created by the best human artists. AI can fine-tune an image to photorealism but AI has difficulty de-tuning images to a level of abstraction that is appealing to the human eye (well, my eyes anyway). Here are three AI generated images for “abstract portrait” that are nice, but not quite as nice as the best human-generated art I’ve seen.

Demo code.

# financial_topic.py

# requires pip install torch (PyTorch)
# pip install transformers datasets evaluate accelerate 

import numpy as np

# suppress most messages - don't do this for non-demos
import warnings
warnings.filterwarnings('ignore')
from transformers.utils import logging
logging.set_verbosity(50)
import datasets
datasets.disable_progress_bar()

import evaluate
from datasets import load_dataset
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
from transformers import Trainer

print("\nBegin fine-tuning demo ")

sm = "google-bert/bert-base-cased"  # string ID for base model
print("\nBase model: ")
print(sm)

print("\nApplying base to: " + \
  "The woman worked as a [MASK] at night.")
completer = pipeline('fill-mask', model=sm, top_k=10)
results = \
  completer("The woman worked as a [MASK] at night.")
for i in range(len(results)):
  print(results[i]['token_str'].ljust(14), end="")
  print("%8d" % results[i]['token'], end="")
  print("%12.4f" % results[i]['score'])

# waitress         15098      0.2799
# nurse             7439      0.1808
# maid             13487      0.1578
# housekeeper      26458      0.0627
# bartender        18343      0.0276
# secretary         4848      0.0229
# servant           8108      0.0225
# prostitute       21803      0.0217
# cook              9834      0.0169
# cleaner          23722      0.0104

print("\nCreating classifier from base model for 20 labels ")
the_model = \
  AutoModelForSequenceClassification.from_pretrained(sm,
  num_labels=20)
the_tokenizer = AutoTokenizer.from_pretrained(sm)

print("\nLoading financial topic raw train and test data ")
datasets = \
  load_dataset("zeroshot/twitter-financial-news-topic")
print("Done")

# topics = {
#  "LABEL_0": "Analyst Update",
#  "LABEL_1": "Fed | Central Banks",
#  "LABEL_2": "Company | Product News",
#  "LABEL_3": "Treasuries | Corporate Debt",
#  "LABEL_4": "Dividend",
#  "LABEL_5": "Earnings",
#  "LABEL_6": "Energy | Oil",
#  "LABEL_7": "Financials",
#  "LABEL_8": "Currencies",
#  "LABEL_9": "General News | Opinion",
#  "LABEL_10": "Gold | Metals | Materials",
#  "LABEL_11": "IPO",
#  "LABEL_12": "Legal | Regulation",
#  "LABEL_13": "M&A | Investments",
#  "LABEL_14": "Macro",
#  "LABEL_15": "Markets",
#  "LABEL_16": "Politics",
#  "LABEL_17": "Personnel Change",
#  "LABEL_18": "Stock Commentary",
#  "LABEL_19": "Stock Movement",
# }

print("\nSelecting first 50 training messages, " + \
 "first 10 test messages ")
tiny_train_dataset = \
  datasets["train"].shuffle(seed=7).select(range(50))
tiny_test_dataset = \
  datasets["validation"].shuffle(seed=7).select(range(10))
print("Done")

message_25 = tiny_train_dataset[25]
print("\nTraining message [25]: ")
print(message_25)
# {'text': 'Bank Of Korea Raises Key Interest Rate 
# To 2.25% From 1.75%', 'label': 1}

print("\nTokenizing training and test data ")

def tokenize_function(examples):
  return the_tokenizer(examples["text"], \
    padding="max_length", truncation=True)

tiny_train_tokenized_dataset = \
  tiny_train_dataset.map(tokenize_function, batched=True)
tiny_test_tokenized_dataset = \
  tiny_test_dataset.map(tokenize_function, batched=True)
print("Done ")

print("\nPreparing to train base classifier model ")

metric = evaluate.load("accuracy")
def metrics_function(eval_pred):
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=-1)
  return metric.compute(predictions=predictions, \
    references=labels)

training_args = TrainingArguments(output_dir="test_trainer",
  num_train_epochs=2, evaluation_strategy="epoch")
trainer = Trainer(
  model=the_model,
  args=training_args,
  train_dataset=tiny_train_tokenized_dataset,
  eval_dataset=tiny_test_tokenized_dataset,
  compute_metrics=metrics_function
)

print("\nStart training/fine-tuning base classifier model ")
trainer.train()
print("Done ")

# use the fine-tuned model
classifier = pipeline("text-classification", model=the_model,
  tokenizer=the_tokenizer)
print("\nUsing fine-tuned model to classify: " + \
  "Acme Corp announces new CEO")
result = classifier("Acme Corp announces new CEO")
print("\nResult: ")
print(result) 

# Result:
# [{'label': 'LABEL_2', 'score': 0.17667075991630554}]

print("\nEnd demo ")