Bottom line: The scikit Diabetes Dataset was compiled by Stanford professor Brad Efron while he was helping Dr. T. McLaughlin analyze data for a medical study that was later published in “Differentiation Between Obesity and Insulin Resistance in the Association with C-Reactive Protein” (2002).
The scikit-learn library has a nice collection of datasets for experiments. One of the most popular datasets for regression is the Diabetes Dataset. See scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html.
Note: The scikit Diabetes Dataset is different from the Pima Indians Diabetes Dataset.
The Diabetes Dataset has 442 items. Each item represents a patient and has 10 predictor values followed by a target value to predict. The data looks like:
59, 2, 32.1, 101.00, 157, 93.2, 38, 4.00, 4.8598, 87, 151 48, 1, 21.6, 87.00, 183, 103.2, 70, 3.00, 3.8918, 69, 75 72, 2, 30.5, 93.00, 156, 93.6, 41, 4.00, 4.6728, 85, 141 . . .
The 10 predictor variables are age, sex, body mass index, blood pressure, serum cholesterol, low-density lipoproteins, high-density lipoproteins, total cholesterol, triglycerides, blood sugar. The target value in the last column is a measure of diabetes.
Note: The sex encoding isn’t explained but I suspect male = 1, female = 2 because there are 235 1 values and 206 2 values).
The scikit page points to a Web page by Dennis Boos (NCSU) that has a link to the raw data. Boos’ page states that the data came from a Web page owned by Trevor Hastie (Stanford), but the link is dead. Boos’ Web page states that the Diabetes Data was first referenced in a research paper “Least Angle Regression” (2004), by B. Efron, T. Hastie, I. Johnstone, R. Tibshirani.
I enjoy tracking down original sources. So I wrote an email message to Efron, Hastie, Johnstone and Tibshirani (all are at Stanford) asking for information. Brad Efron graciously replied very quickly. He explained that the Diabetes Dataset came from data analysis that he was helping Dr. T. McLaughlin (Stanford) with. The results appeared in research paper “Differentiation Between Obesity and Insulin Resistance in the Association with C-Reactive Protein” (2002), by T. McLaughlin, F. Abbasi, C. Lamendola, L. Liang, G. Reaven, P. Schaaf, P. Reaven. That paper has the comment, “The authors express their appreciation to Bradley Efron, PhD, for statistical assistance with the manuscript.”
To summarize, the scikit Diabetes Dataset was originally generated by Dr. T. McLaughlin and colleagues. Then B. Efron, who was helping with the analysis, compiled 442 data items (not clear if this was a subset or the entire research dataset). Then the raw data was posted on a Web site by Efron’s colleague T. Hastie, and shortly later the data was also posted on a Web site by D. Boos. The Hastie page vanished at some point, leaving the Boos page as the primary source of the data. Later, the scikit library fetched the diabetes data from Boos’ page (most likely in 2010), where it is now widely available.
An interesting investigation!
Note: I discovered that the default target to predict, the diabetes score in the last column of the dataset, cannot be predicted with meaningful accuracy. But the variables in columns [4], [5], [6], [7], and [8] can be predicted nicely.

I enjoy history of all kinds, but especially the history of computer science and the history of early science fiction movies. Here are three of my favorite science fiction movies from the 1950s that feature alien flying saucers. All three movies have fascinating histories.
Left: “The Atomic Submarine” (1959) – In the near future, an alien flying saucer is under the sea in the Artic, destroying cargo submarines. The USS Tigerfish submarine manages to destroy the evil alien saucer. Very scary scenes inside the saucer. Innovative electronic sound effects.
Center: “Earth vs. the Flying Saucers” (1956) – The title pretty much says it all. Impressive stop-motion animation of the flying saucers by the famous Ray Harryhausen. I could never quite figure out if the aliens only retaliated after being attacked by the Earth military, or if they were evil from the beginning.
Right: “Invaders from Mars” (1953) – A young boy thinks he sees a flying saucer land in the sand pit in a field behind his house. This movie scared the heck out of me and I had nightmares about the path up the hill for years. Brilliantly directed by William Cameron Menzies.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.