"Data Dimensionality Reduction Using a Neural Autoencoder with C#" in Visual Studio Magazine

I wrote an article titled “Data Dimensionality Reduction Using a Neural Autoencoder with C#” in the July 2024 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/Articles/2024/07/15/data-dimensionality-reduction.aspx.

If you have a dataset that has many columns (dimensions). In some scenarios it’s useful to create an approximation of the dataset that has fewer columns. This is called dimensionality reduction. The two most common techniques for dimensionality reduction are using PCA (principal component analysis) and using a neural autoencoder. My article explains how to perform dimensionality reduction using a neural autoencoder implemented with the C# language.

My demo uses a synthetic dataset. The raw data looks like:

F  24  michigan  29500.00  lib
M  39  oklahoma  51200.00  mod
F  63  nebraska  75800.00  con
M  36  michigan  44500.00  mod
F  27  nebraska  28600.00  lib
. . .

The fields are sex, age, state of residence, annual income, political leaning.

Because neural networks accept only numeric data, the source data must be normalized and encoded:

 1.0  0.2400  1.0  0.0  0.0  0.2950  0.0  0.0  1.0
-1.0  0.3900  0.0  0.0  1.0  0.5120  0.0  1.0  0.0
 1.0  0.6300  0.0  1.0  0.0  0.7580  1.0  0.0  0.0
-1.0  0.3600  1.0  0.0  0.0  0.4450  0.0  1.0  0.0
 1.0  0.2700  0.0  1.0  0.0  0.2860  0.0  0.0  1.0
. . .

Sex is encoded as M = -1 and F = 1. Age is normalized by dividing by 100. State is one-hot encoded as Michigan = 100, Nebraska = 010, Oklahoma = 001. Income is normalized by dividing by 100,000. Political leaning is one-hot encoded as conservative = 100, moderate = 010, liberal = 001.

The demo instantiates a 9-6-9 neural autoencoder

 0.0102   0.2991  -0.0517   0.0154  -0.8028   0.9672
-0.2268   0.8857   0.0029  -0.2421   0.7477  -0.9319
 0.0697  -0.9168   0.2438   0.9212   0.4091   0.2533
-0.0505   0.2831   0.5931  -0.9208   0.6399  -0.2666
 0.5075   0.1818   0.0889   0.9078  -0.8808   0.3985
. . .

The reduced data can be used as a surrogate for the original data. Common use-cases include data visualization in a 2D graph (if the data is reduced to just two columns instead of the six columns in the demo), use in machine learning algorithms (such as k-means clustering) that only work with numeric data, use in machine learning algorithms that can only handle a relatively small number of columns (such as those that compute a matrix inverse), and use in data cleaning (because the reduced data removes statistical noise).

Software systems have a certain kind of aesthetic that is quite subjective. I find neural autoencoder dimensionality reduction beautiful. When I was a young man I built many scale models of airplanes. Most of the models I built were U.S. planes that I found to be beautiful. But adversaries of the U.S. built some beautiful planes too.

Left: The German Albatros D.V was first flown in 1917, near the end of World War I. About 2,500 were built. I like the streamlined fuselage and nose cone, compared to most WWI plane designs that looked kind of clunky.

Center: The Italian Reggiane Re.2005 Sagittario was introduced in 1942 at the height of World War II. It was as good as any plane flying at the time, but alas, only 48 were produced by Italy. Classic Italy — beautiful design but poorly executed production.

Right: The Soviet MiG-21 looks fast, functional, and lethal, and it was/is. It first flew in 1955 and is still in service in many countries. Over 11,000 were built. Typical Soviet technology with emphasis on quantity rather than quality.

1 Response to “Data Dimensionality Reduction Using a Neural Autoencoder with C#” in Visual Studio Magazine

Thorsten Kleppe says:

July 25, 2024 at 4:36 am

According to DeCaf, we have to build deep:

https://x.com/ProfTomYeh/status/1816225776054026508

The paper shows better differentiation for deeper architectures, much better than other techniques could do.

But it can be tough to find good use cases for autoencoders. Improving predictions with autoencoder in front of a neural network was not successful so far.

Loading...