I wrote an article in the July 2020 edition of Visual Studio Magazine titled, “Data Prep for Machine Learning: Outliers”. See https://visualstudiomagazine.com/articles/2020/07/14/ml-data-prep-outliers.aspx.
The article explains how to programmatically identify and deal with outlier data. Suppose you have a data file of loan applications. Examples of outlier data include a person’s age of 99 (either a very old applicant or possibly a placeholder value that was never changed) and a person’s country of “Cannada” (probably a transcription error).
In situations where the source data file is small, about 500 lines or less, you can usually find and deal with outlier data manually. But in almost all realistic scenarios with large datasets you must handle outlier data programmatically.
The article explains how to find numeric data outliers by computing z-scores, and how to find categorical data outliers by computing frequency counts.
Data preparation is an umbrella term for many different activities. Data preparation is always tedious and much more time consuming than expected. There’s nothing conceptually difficult about data preparation. But there are many steps and each step has many small details to attend to.
There’s no umbrella term for weird umbrellas. Left: Goth umbrella in outer space. Center: My dogs love to chase squirrels but I don’t know what they’d do if they saw this one. Right: Japan. That’s all you need to know.



.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
Very interesting article from this series, thank you very much.