One recent morning before work, I figured I’d explore text summarization using the HuggingFace (HF) large language model libraries. Bottom line(s): The HF libraries are incredibly easy to use, but the documentation for HF libraries is somewhat of a mess.
I was able to zap out a demo relatively quickly. I found an arbitrary news article that is essentially an interview with Warren Buffet. The article text is:
Unlike most billionaires, Berkshire Hathaway Chairman and CEO Warren Buffett has always been a vocal advocate for working class Americans. He famously suggested raising taxes on wealthy individuals like himself and recently claimed that no American would have to pay "a dime of federal taxes" if other corporations paid their fair share. "We always hope at Berkshire to pay substantial federal income taxes," he said at the company's annual meeting. With that in mind, some of Buffett's more unconventional thoughts on wealth inequality are probably worth closer inspection. "No conspiracy lies behind this depressing fact: The poor are most definitely not poor because the rich are rich. Nor are the rich undeserving. Most of them have contributed brilliant innovations or managerial expertise to America's well-being," the famous investor wrote in a 2015 Wall Street Journal op-ed. "Instead, this widening gap is an inevitable consequence of an advanced market-based economy." Here's a closer look at Buffett's argument. Buffett believes the market economy has become more and more "specialized" with "economic rewards flowing to people with specialized talents." This, he says, has caused the wealth gap with many people barely getting by while others thrive. "It was an agrarian economy a couple hundred years ago," he said in an interview with CNN. "Very hard, you know, to get 20 times the wealth of the next guy because you were a little bit better farmer. But if you're better at some skills now, you can become incredibly wealthy at a very young age … You get to capitalize [the] value of an idea. And so the wealth moves big time, even on an anticipatory basis." Now, he says, there's a "mismatch" between the requirements of attractive jobs and the skills of the early American labor force, which is "simply a consequence of an economic engine that constantly requires more high-order talents while reducing the need for commodity-like tasks." The brutal truth, he says, is that "a great many people" will be left behind in an advanced economic system.
I wrote a tiny Python language program to summarize the article. The result:
[{'summary_text': ' Warren Buffett has always been a vocal
advocate for working class Americans . Buffett believes the
market economy has become more and more "specialized" with
"economic rewards flowing to people with specialized
talents" This, he says, has caused the wealth gap with many
people barely getting by .'}]
But I had many unanswered questions in my mind. Here’s the program I wrote:
# hf_summarization_demo.py
# Anaconda 2023.09-0 Python 3.11.5
# transformers 4.32.1
from transformers import pipeline
print("\nBegin text summarization demo ")
print("Using HF pipeline with pretrained model approach ")
article = '''Unlike most billionaires, Berkshire Hathaway
Chairman and . . . (see above) . . . will be left
behind in an advanced economic system.'''
print("\nSource article: \n")
print(article)
model_id = "sshleifer/distilbart-cnn-12-6"
# rev_id = "a4f8f3e"
print("\nUsing pretrained model: " + model_id)
# summarizer = pipeline("summarization", model=model_id,
# revision=rev_id)
# apparently, if you don't specify a revision ID, HF
# will use the latest revision
summarizer = pipeline("summarization", model=model_id)
summary_text = summarizer(article, max_length=100,
min_length=20, do_sample=False) # do_sample ?
print("\nSummary: \n")
print(summary_text)
print("\nEnd demo ")
The HF documentation is quite disorganized. This often happens when a technology is new and changing rapidly, so the chaos wasn’t unexpected.
I knew from previous explorations that HF has a very high-level “pipeline” object that hides almost all details, and is the easiest way to get started with a language task. I found several examples of text summarization using an HF pipeline — all of them significantly different and contradictory to some extent.
My first attempt instantiated a pipeline like so:
summarizer = pipeline("summarization")
This gave a warning message that a pipeline should be instantiated using a specific pretrained model and revision, and that the default model is “sshleifer/distilbart-cnn-12-6”, revision “a4f8f3e”. Because this was my first use of the pretrained model, it was downloaded from the HF servers to my machine and cached for subsequent program runs.
It appears that the pretrained model was created by user sshleifer, using base language model distilbart, trained on some CNN news articles.
For my second attempt I instantiated as:
model_id = "sshleifer/distilbart-cnn-12-6"
rev_id = "a4f8f3e"
summarizer = pipeline("summarization", model=model_id,
revision=rev_id)
And everything was hunky-dory. But the next question was, just how would I specify a pretrained model if the warning message hadn’t told me? So I went to the HF pretrained models page. There were 738,805 pretrained models! Yikes. I filtered for summarization models and saw that there were 1,830 pretrained summarization models. Yikes Part II. I figured that I’d want to zero-in on pretrained summarization models that were trained on some sort of news article datasets, but I could discover no easy way to find such models.

Top: There were 1,830 pretrained text summarization models but no easy way to search them by type of training dataset. Bottom: The page for the pretrained model I used, which is the default for text summarization if a model isn’t specified.
I was able to search models by name, specifying “sshleifer”, and found the page for that pretrained model. It had links to the two datasets that were used to fine-tune train the model. This is OK, but it’d be nice if there was a way to search for pretrained models that were trained by a particular type of datasets (such as news sources). Maybe there is a way, but it wasn’t obvious to me.
The idea here is that if you want to do text summarization, you can use an HF pipeline object with a pretrained model created by someone else and saved on HF, or you can start with a base LLM model like DistilBART or GPT-3 and fine-tune train it yourself using data that’s relevant to your scenario. This second approach is a lot of work.
I noticed that when I ran my demo program several times, sometimes I’d get the same summarization result, but sometimes different results. Another minor mystery to explore.
Well, at this point, it was time to go to work, so off I went.

I have a fairly good understanding of large language models. I do not have a good understanding of fashion models. I get the idea that fashion models are supposed to have neutral facial expressions so that they don’t detract from the clothes they are modeling, but some models seem to go actively hostile.
Left: If I saw her headed my way, I wouldn’t make eye contact in case she was carrying a weapon of some sort.
Center: She may be justified in her angry look because she’s wearing an ill-advised noose necklace.
Right: This is the famous Chinese model with just one leg. Her name is not I-Leen.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.