I wrote an article titled “Data Prep for Machine Learning: Encoding” in the August 2020 edition of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2020/08/12/ml-data-prep-encoding.aspx.
The article is one of a series where I walk through the entire process of programmatically preparing data for use by a deep neural network model. In artificial scenarios where there isn’t much data, it’s often possible to prepare data manually, using a text editor or spreadsheet program. But in realistic scenarios with very large data files, you’ve got to programmatically prepare the data. This process is very tricky and time consuming.
Encoding is the process of transforming non-numeric data, such as “blue”, into a numeric form, such as (0, 1, 0, 0). There are dozens of kinds of encoding. The most common are one-hot encoding, zero-one encoding, minus-one-plus-one encoding, and ordinal encoding. Encoding is conceptually easy but tricky in practice. The major challenge, however, is knowing what type of encoding to use in a particular situation.
I previously published articles on dealing with missing data, dealing with outlier data, and normalizing data. I’m working on articles that cover splitting data files (typically into training and test sets — a surprisingly tricky task) and serving up batches of training data (also surprisingly tricky). When I’m done with all six articles, I’ll probably put together one mega-example that does all of the transformations from ugly raw data to beautiful ready-for-ML data.
Many movies feature the “ugly duckling turns to swan” transformation. Left: In “The Princess Diaries” (2001) actress Anne Hathaway plays dorky high school student Mia Thermopolis who turns out to be a princess of Genovia. Center: In “My Fair Lady” (1964) actress Audrey Hepburn plays uncultured street girl Eliza Doolittle who is transformed into an English lady on a bet. Right: In “Miss Congeniality” (2000) actress Sandra Bullock plays crusty FBI agent Gracie Hart who must go undercover in a beauty pageant.




.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.