Why You Shouldn’t Use Drop-First Encoding for Neural Network Categorical Predictor Variables

Bottom line: For a neural network regressor, if you have categorical predictor data, you should use standard one-hot encoding rather than the drop-first encoding that is usually needed for a linear regression model. Drop-first encoding works for a neural network, but drop-first has no advantage over one-hot, and nobody ever uses drop-first with neural networks.

Suppose you have a categorical predictor variable, such as color with possible values red, blue, green. If you are creating a neural network regression model, you should use one-hot encoding: red = 1 0 0, blue = 0 1 0, green = 0 0 1. But if you are creating a linear regression model, and you intend to use closed form training (via Moore-Penrose pseudo inverse, or left pseudo-inverse via normal equations) you must use drop-first encoding: red = 0 0, blue = 1 0, green = 0 1 because if you use one-hot encoding, the matrix inverse operation will likely fail due to collinearity in the training data.

(If you intend to use stochastic gradient descent, SGD, for linear regression, you can use either one-hot or drop-first encoding but drop-first encoding is recommended because it will work regardless of training algorithm used).

I had never seen an example where drop-first encoding was used for a neural network. I put together a demo. I wasn’t sure what to expect, but the drop-first encoding worked slightly worse than standard one-hot encoding. One example isn’t conclusive, but the results suggest there’s no reason to even consider drop-first encoding for a neural network regressor.

For my demo, I used one of my standard synthetic datasets. The raw data looks like:

F, 24, Michigan, 29,500.00, liberal
M, 39, Oklahoma, 51,200.00, moderate
F, 63, Nebraska, 75,800.00, conservative

The fields are sex (M, F), age, State (only Michigan, Nebraska, Oklahoma), income, politics (conservative, moderate, liberal). The goal is to predict income from sex, age, state, and political leaning. There are 200 training items and 40 test items.

The one-hot normalized and encoded data looks like:

1, 0.24, 1,0,0, 0.2950, 0,0,1
0, 0.39, 0,0,1, 0.5120, 0,1,0
1, 0.63, 0,1,0, 0.7580, 1,0,0
. . .

The drop-first normalized and encoded data looks like:

1, 0.24, 0,0, 0.2950, 0,1
0, 0.39, 0,1, 0.5120, 1,0
1, 0.63, 1,0, 0.7580, 0,0
. . .

I use the scikit-learn MLPRegressor (“multi-layer perceptron regressor”) module. It has a large number of parameters but I tried to keep things as simple as possible by using a single hidden layer of 100 nodes, tanh hidden activation, SGD training, and so on.

The output of my demo is:

Scikit NN regression (one-hot)
Predict income from sex, age, State, politics

Loading one-hot data into memory

Training X data:
[[1.0000 0.2400 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000]
 [0.0000 0.3900 0.0000 0.0000 1.0000 0.0000 1.0000 0.0000]
 [1.0000 0.6300 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000]]
. . .

Training y data:
[0.2950 0.5120 0.7580]
. . .

Creating 8-(100)-1 tanh NN regressor

Training with bat sz = 10 lrn rate = 0.01 max_iter = 200
Done

Accuracy train (within 0.10): 0.9200
Accuracy test (within 0.10): 0.9200

MSE train: 0.000692
MSE test: 0.000692

=====================================

Loading drop-first data into memory

Training X data:
[[1.0000 0.2400 0.0000 0.0000 0.0000 1.0000]
 [0.0000 0.3900 0.0000 1.0000 1.0000 0.0000]
 [1.0000 0.6300 1.0000 0.0000 0.0000 0.0000]]
. . .

Training y data:
[0.2950 0.5120 0.7580]
. . .

Creating 6-(100)-1 tanh NN regressor

Starting training
Done

Accuracy train (within 0.10): 0.8900
Accuracy test (within 0.10): 0.8900

MSE train: 0.000741
MSE test: 0.000741

End scikit NN one-hot demo

The conclusion is when using neural network regression, for categorical predictor data, standard one-hot encoding is preferable to drop-first encoding, because drop-first has no technical advantage over one-hot, and one-hot encoding is universally used. An interesting experiment.



I’ve always been fascinated by the competition between ideas in machine learning. And I’ve always been fascinated by military competitions too.

Left column: The Lockheed XF-90 jet (top) lost a competition to the McDonnell XF-88 (bottom) in 1948. The XF-90 was very beautiful but had underpowered engines.

Right column: The Boeing X-32 jet (top) lost a competition to the Lockheed X-35 jet (bottom) in 2001. The X-35 was superior to the X-32 in almost every way, including looks. The X-32 was an incredibly ugly plane featuring a huge gaping mouth. The X-32 was nicknamed “Monica” in reference to Monica Lewinsky who gave frequent mouth service to U.S. President Bill Clinton from 1995-1997.


Demo program. Replace the “lt” in the accuracy() function with the less-than Boolean operator symbol.

# people_income_nn.py

# predict income 
# from sex, age, state, politics
# standard one-hot encoding

# sex  age    state    income   politics
#  0   0.27   0  1  0   0.7610   0  0  1
#  1   0.19   0  0  1   0.6550   1  0  0
# state: michigan = 100, nebraska = 010, oklahoma = 001
# politics: conservative, moderate, liberal

# Anaconda3-2025.12.1  Python 3.13.9  scikit 1.7.2
# Windows 11

import numpy as np 
from sklearn.neural_network import MLPRegressor

# import warnings
# warnings.filterwarnings('ignore')  # early-stop warnings

# -----------------------------------------------------------
# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)[0]

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  # print("Correct = " + str(n_correct))
  # print("Wrong   = " + str(n_wrong))
  return n_correct / (n_correct + n_wrong)

# -----------------------------------------------------------

def MSE(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)[0]
    sum += (y - y_pred) * (y - y_pred)

  return sum / n

# -----------------------------------------------------------
# -----------------------------------------------------------

def main():
  # 0. get ready
  print("\nScikit NN regression (one-hot) ")
  print("Predict income from sex, age, State, politics ")
  np.random.seed(1)
  np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')

  # 1. load data
  print("\nLoading one-hot data into memory ")
  train_file = ".\\Data\\people_train_one_hot.txt"
  train_xy = np.loadtxt(train_file, 
    usecols=[0,1,2,3,4,5,6,7,8], delimiter=",",
    comments="#",  dtype=np.float32) 
  train_X = train_xy[:,[0,1,2,3,4,6,7,8]]
  train_y = train_xy[:,5]

  test_file = ".\\Data\\people_test_one_hot.txt"
  test_xy = np.loadtxt(test_file,
    usecols=[0,1,2,3,4,5,6,7,8], delimiter=",",
    comments="#",  dtype=np.float32) 
  test_X = train_xy[:,[0,1,2,3,4,6,7,8]]
  test_y = train_xy[:,5]

  print("\nTraining X data:")
  print(train_X[0:3])
  print(". . . ")
  print("\nTraining y data: ")
  print(train_y[0:3])
  print(". . . ")

# -----------------------------------------------------------

  # 2. create network 
  # sklearn.neural_network.MLPRegressor(loss='squared_error',
  # hidden_layer_sizes=(100,), activation='relu', *, 
  # solver='adam', alpha=0.0001, batch_size='auto',
  # learning_rate='constant', learning_rate_init=0.001,
  # power_t=0.5, max_iter=200, shuffle=True, 
  # random_state=None, tol=0.0001, verbose=False, 
  # warm_start=False, momentum=0.9, nesterovs_momentum=True,
  # early_stopping=False, validation_fraction=0.1,
  # beta_1=0.9, beta_2=0.999, epsilon=1e-08, 
  # n_iter_no_change=10, max_fun=15000)

  params = { 'hidden_layer_sizes' : [100],
    'activation' : 'tanh',
    'solver' : 'sgd',
    'alpha' : 0.001,
    'batch_size' : 10,
    'random_state' : 0,
    'tol' : 0.0001,
    'nesterovs_momentum' : False,
    'early_stopping' : False,
    'learning_rate' : 'constant',
    'learning_rate_init' : 0.01,
    'max_iter' : 200,
    'shuffle' : True,
    'n_iter_no_change' : 50,
    'verbose' : False }
       
  print("\nCreating 8-(100)-1 tanh NN regressor ")
  net = MLPRegressor(**params)

# -----------------------------------------------------------

  # 3. train
  print("\nTraining with bat sz = " + \
    str(params['batch_size']) + " lrn rate = " + \
    str(params['learning_rate_init']) + \
    " max_iter = " + str(params['max_iter']))
  net.fit(train_X, train_y)
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  acc_train = accuracy(net, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(net, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(net, train_X, train_y)
  print("\nMSE train: %0.6f " % mse_train)
  mse_test = MSE(net, test_X, test_y)
  print("MSE test: %0.6f " % mse_test)

# -----------------------------------------------------------
# drop-first version
# -----------------------------------------------------------

# sex  age   state   income   politics
#  0   0.27   1  0   0.7610   0  1
#  1   0.19   0  1   0.6550   0  0

  print("\n===================================== ")

  print("\nLoading drop-first data into memory ")
  train_file = ".\\Data\\people_train_drop_first.txt"
  train_xy = np.loadtxt(train_file, 
    usecols=[0,1,2,3,4,5,6], delimiter=",",
    comments="#",  dtype=np.float32) 
  train_X = train_xy[:,[0,1,2,3,5,6]]
  train_y = train_xy[:,4]

  test_file = ".\\Data\\people_test_drop_first.txt"
  test_xy = np.loadtxt(test_file,
    usecols=[0,1,2,3,4,5,6], delimiter=",",
    comments="#",  dtype=np.float32) 
  test_X = train_xy[:,[0,1,2,3,5,6]]
  test_y = train_xy[:,4]

  print("\nTraining X data:")
  print(train_X[0:3])
  print(". . . ")
  print("\nTraining y data: ")
  print(train_y[0:3])
  print(". . . ")

  print("\nCreating 6-(100)-1 tanh NN regressor ")
  net = MLPRegressor(**params)

  print("\nStarting training ")
  net.fit(train_X, train_y)
  print("Done ")

  acc_train = accuracy(net, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(net, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(net, train_X, train_y)
  print("\nMSE train: %0.6f " % mse_train)
  mse_test = MSE(net, test_X, test_y)
  print("MSE test: %0.6f " % mse_test)

  print("\nEnd scikit NN one-hot demo ")

if __name__ == "__main__":
  main()

One-hot training data:

# people_train_one_hot.txt
# sex (0 = male, 1 = female) - dependent variable
# age (div 100),
# state (michigan = 100, nebraska = 010, oklahoma = 001),
# income (div $100,000),
# politics type (conservative, moderate, liberal)
#
1,0.24,1,0,0,0.2950,0,0,1
0,0.39,0,0,1,0.5120,0,1,0
1,0.63,0,1,0,0.7580,1,0,0
0,0.36,1,0,0,0.4450,0,1,0
1,0.27,0,1,0,0.2860,0,0,1
1,0.50,0,1,0,0.5650,0,1,0
1,0.50,0,0,1,0.5500,0,1,0
0,0.19,0,0,1,0.3270,1,0,0
1,0.22,0,1,0,0.2770,0,1,0
0,0.39,0,0,1,0.4710,0,0,1
1,0.34,1,0,0,0.3940,0,1,0
0,0.22,1,0,0,0.3350,1,0,0
1,0.35,0,0,1,0.3520,0,0,1
0,0.33,0,1,0,0.4640,0,1,0
1,0.45,0,1,0,0.5410,0,1,0
1,0.42,0,1,0,0.5070,0,1,0
0,0.33,0,1,0,0.4680,0,1,0
1,0.25,0,0,1,0.3000,0,1,0
0,0.31,0,1,0,0.4640,1,0,0
1,0.27,1,0,0,0.3250,0,0,1
1,0.48,1,0,0,0.5400,0,1,0
0,0.64,0,1,0,0.7130,0,0,1
1,0.61,0,1,0,0.7240,1,0,0
1,0.54,0,0,1,0.6100,1,0,0
1,0.29,1,0,0,0.3630,1,0,0
1,0.50,0,0,1,0.5500,0,1,0
1,0.55,0,0,1,0.6250,1,0,0
1,0.40,1,0,0,0.5240,1,0,0
1,0.22,1,0,0,0.2360,0,0,1
1,0.68,0,1,0,0.7840,1,0,0
0,0.60,1,0,0,0.7170,0,0,1
0,0.34,0,0,1,0.4650,0,1,0
0,0.25,0,0,1,0.3710,1,0,0
0,0.31,0,1,0,0.4890,0,1,0
1,0.43,0,0,1,0.4800,0,1,0
1,0.58,0,1,0,0.6540,0,0,1
0,0.55,0,1,0,0.6070,0,0,1
0,0.43,0,1,0,0.5110,0,1,0
0,0.43,0,0,1,0.5320,0,1,0
0,0.21,1,0,0,0.3720,1,0,0
1,0.55,0,0,1,0.6460,1,0,0
1,0.64,0,1,0,0.7480,1,0,0
0,0.41,1,0,0,0.5880,0,1,0
1,0.64,0,0,1,0.7270,1,0,0
0,0.56,0,0,1,0.6660,0,0,1
1,0.31,0,0,1,0.3600,0,1,0
0,0.65,0,0,1,0.7010,0,0,1
1,0.55,0,0,1,0.6430,1,0,0
0,0.25,1,0,0,0.4030,1,0,0
1,0.46,0,0,1,0.5100,0,1,0
0,0.36,1,0,0,0.5350,1,0,0
1,0.52,0,1,0,0.5810,0,1,0
1,0.61,0,0,1,0.6790,1,0,0
1,0.57,0,0,1,0.6570,1,0,0
0,0.46,0,1,0,0.5260,0,1,0
0,0.62,1,0,0,0.6680,0,0,1
1,0.55,0,0,1,0.6270,1,0,0
0,0.22,0,0,1,0.2770,0,1,0
0,0.50,1,0,0,0.6290,1,0,0
0,0.32,0,1,0,0.4180,0,1,0
0,0.21,0,0,1,0.3560,1,0,0
1,0.44,0,1,0,0.5200,0,1,0
1,0.46,0,1,0,0.5170,0,1,0
1,0.62,0,1,0,0.6970,1,0,0
1,0.57,0,1,0,0.6640,1,0,0
0,0.67,0,0,1,0.7580,0,0,1
1,0.29,1,0,0,0.3430,0,0,1
1,0.53,1,0,0,0.6010,1,0,0
0,0.44,1,0,0,0.5480,0,1,0
1,0.46,0,1,0,0.5230,0,1,0
0,0.20,0,1,0,0.3010,0,1,0
0,0.38,1,0,0,0.5350,0,1,0
1,0.50,0,1,0,0.5860,0,1,0
1,0.33,0,1,0,0.4250,0,1,0
0,0.33,0,1,0,0.3930,0,1,0
1,0.26,0,1,0,0.4040,1,0,0
1,0.58,1,0,0,0.7070,1,0,0
1,0.43,0,0,1,0.4800,0,1,0
0,0.46,1,0,0,0.6440,1,0,0
1,0.60,1,0,0,0.7170,1,0,0
0,0.42,1,0,0,0.4890,0,1,0
0,0.56,0,0,1,0.5640,0,0,1
0,0.62,0,1,0,0.6630,0,0,1
0,0.50,1,0,0,0.6480,0,1,0
1,0.47,0,0,1,0.5200,0,1,0
0,0.67,0,1,0,0.8040,0,0,1
0,0.40,0,0,1,0.5040,0,1,0
1,0.42,0,1,0,0.4840,0,1,0
1,0.64,1,0,0,0.7200,1,0,0
0,0.47,1,0,0,0.5870,0,0,1
1,0.45,0,1,0,0.5280,0,1,0
0,0.25,0,0,1,0.4090,1,0,0
1,0.38,1,0,0,0.4840,1,0,0
1,0.55,0,0,1,0.6000,0,1,0
0,0.44,1,0,0,0.6060,0,1,0
1,0.33,1,0,0,0.4100,0,1,0
1,0.34,0,0,1,0.3900,0,1,0
1,0.27,0,1,0,0.3370,0,0,1
1,0.32,0,1,0,0.4070,0,1,0
1,0.42,0,0,1,0.4700,0,1,0
0,0.24,0,0,1,0.4030,1,0,0
1,0.42,0,1,0,0.5030,0,1,0
1,0.25,0,0,1,0.2800,0,0,1
1,0.51,0,1,0,0.5800,0,1,0
0,0.55,0,1,0,0.6350,0,0,1
1,0.44,1,0,0,0.4780,0,0,1
0,0.18,1,0,0,0.3980,1,0,0
0,0.67,0,1,0,0.7160,0,0,1
1,0.45,0,0,1,0.5000,0,1,0
1,0.48,1,0,0,0.5580,0,1,0
0,0.25,0,1,0,0.3900,0,1,0
0,0.67,1,0,0,0.7830,0,1,0
1,0.37,0,0,1,0.4200,0,1,0
0,0.32,1,0,0,0.4270,0,1,0
1,0.48,1,0,0,0.5700,0,1,0
0,0.66,0,0,1,0.7500,0,0,1
1,0.61,1,0,0,0.7000,1,0,0
0,0.58,0,0,1,0.6890,0,1,0
1,0.19,1,0,0,0.2400,0,0,1
1,0.38,0,0,1,0.4300,0,1,0
0,0.27,1,0,0,0.3640,0,1,0
1,0.42,1,0,0,0.4800,0,1,0
1,0.60,1,0,0,0.7130,1,0,0
0,0.27,0,0,1,0.3480,1,0,0
1,0.29,0,1,0,0.3710,1,0,0
0,0.43,1,0,0,0.5670,0,1,0
1,0.48,1,0,0,0.5670,0,1,0
1,0.27,0,0,1,0.2940,0,0,1
0,0.44,1,0,0,0.5520,1,0,0
1,0.23,0,1,0,0.2630,0,0,1
0,0.36,0,1,0,0.5300,0,0,1
1,0.64,0,0,1,0.7250,1,0,0
1,0.29,0,0,1,0.3000,0,0,1
0,0.33,1,0,0,0.4930,0,1,0
0,0.66,0,1,0,0.7500,0,0,1
0,0.21,0,0,1,0.3430,1,0,0
1,0.27,1,0,0,0.3270,0,0,1
1,0.29,1,0,0,0.3180,0,0,1
0,0.31,1,0,0,0.4860,0,1,0
1,0.36,0,0,1,0.4100,0,1,0
1,0.49,0,1,0,0.5570,0,1,0
0,0.28,1,0,0,0.3840,1,0,0
0,0.43,0,0,1,0.5660,0,1,0
0,0.46,0,1,0,0.5880,0,1,0
1,0.57,1,0,0,0.6980,1,0,0
0,0.52,0,0,1,0.5940,0,1,0
0,0.31,0,0,1,0.4350,0,1,0
0,0.55,1,0,0,0.6200,0,0,1
1,0.50,1,0,0,0.5640,0,1,0
1,0.48,0,1,0,0.5590,0,1,0
0,0.22,0,0,1,0.3450,1,0,0
1,0.59,0,0,1,0.6670,1,0,0
1,0.34,1,0,0,0.4280,0,0,1
0,0.64,1,0,0,0.7720,0,0,1
1,0.29,0,0,1,0.3350,0,0,1
0,0.34,0,1,0,0.4320,0,1,0
0,0.61,1,0,0,0.7500,0,0,1
1,0.64,0,0,1,0.7110,1,0,0
0,0.29,1,0,0,0.4130,1,0,0
1,0.63,0,1,0,0.7060,1,0,0
0,0.29,0,1,0,0.4000,1,0,0
0,0.51,1,0,0,0.6270,0,1,0
0,0.24,0,0,1,0.3770,1,0,0
1,0.48,0,1,0,0.5750,0,1,0
1,0.18,1,0,0,0.2740,1,0,0
1,0.18,1,0,0,0.2030,0,0,1
1,0.33,0,1,0,0.3820,0,0,1
0,0.20,0,0,1,0.3480,1,0,0
1,0.29,0,0,1,0.3300,0,0,1
0,0.44,0,0,1,0.6300,1,0,0
0,0.65,0,0,1,0.8180,1,0,0
0,0.56,1,0,0,0.6370,0,0,1
0,0.52,0,0,1,0.5840,0,1,0
0,0.29,0,1,0,0.4860,1,0,0
0,0.47,0,1,0,0.5890,0,1,0
1,0.68,1,0,0,0.7260,0,0,1
1,0.31,0,0,1,0.3600,0,1,0
1,0.61,0,1,0,0.6250,0,0,1
1,0.19,0,1,0,0.2150,0,0,1
1,0.38,0,0,1,0.4300,0,1,0
0,0.26,1,0,0,0.4230,1,0,0
1,0.61,0,1,0,0.6740,1,0,0
1,0.40,1,0,0,0.4650,0,1,0
0,0.49,1,0,0,0.6520,0,1,0
1,0.56,1,0,0,0.6750,1,0,0
0,0.48,0,1,0,0.6600,0,1,0
1,0.52,1,0,0,0.5630,0,0,1
0,0.18,1,0,0,0.2980,1,0,0
0,0.56,0,0,1,0.5930,0,0,1
0,0.52,0,1,0,0.6440,0,1,0
0,0.18,0,1,0,0.2860,0,1,0
0,0.58,1,0,0,0.6620,0,0,1
0,0.39,0,1,0,0.5510,0,1,0
0,0.46,1,0,0,0.6290,0,1,0
0,0.40,0,1,0,0.4620,0,1,0
0,0.60,1,0,0,0.7270,0,0,1
1,0.36,0,1,0,0.4070,0,0,1
1,0.44,1,0,0,0.5230,0,1,0
1,0.28,1,0,0,0.3130,0,0,1
1,0.54,0,0,1,0.6260,1,0,0

One-hot test data:

# people_test_one_hot.txt
#
0,0.51,1,0,0,0.6120,0,1,0
0,0.32,0,1,0,0.4610,0,1,0
1,0.55,1,0,0,0.6270,1,0,0
1,0.25,0,0,1,0.2620,0,0,1
1,0.33,0,0,1,0.3730,0,0,1
0,0.29,0,1,0,0.4620,1,0,0
1,0.65,1,0,0,0.7270,1,0,0
0,0.43,0,1,0,0.5140,0,1,0
0,0.54,0,1,0,0.6480,0,0,1
1,0.61,0,1,0,0.7270,1,0,0
1,0.52,0,1,0,0.6360,1,0,0
1,0.30,0,1,0,0.3350,0,0,1
1,0.29,1,0,0,0.3140,0,0,1
0,0.47,0,0,1,0.5940,0,1,0
1,0.39,0,1,0,0.4780,0,1,0
1,0.47,0,0,1,0.5200,0,1,0
0,0.49,1,0,0,0.5860,0,1,0
0,0.63,0,0,1,0.6740,0,0,1
0,0.30,1,0,0,0.3920,1,0,0
0,0.61,0,0,1,0.6960,0,0,1
0,0.47,0,0,1,0.5870,0,1,0
1,0.30,0,0,1,0.3450,0,0,1
0,0.51,0,0,1,0.5800,0,1,0
0,0.24,1,0,0,0.3880,0,1,0
0,0.49,1,0,0,0.6450,0,1,0
1,0.66,0,0,1,0.7450,1,0,0
0,0.65,1,0,0,0.7690,1,0,0
0,0.46,0,1,0,0.5800,1,0,0
0,0.45,0,0,1,0.5180,0,1,0
0,0.47,1,0,0,0.6360,1,0,0
0,0.29,1,0,0,0.4480,1,0,0
0,0.57,0,0,1,0.6930,0,0,1
0,0.20,1,0,0,0.2870,0,0,1
0,0.35,1,0,0,0.4340,0,1,0
0,0.61,0,0,1,0.6700,0,0,1
0,0.31,0,0,1,0.3730,0,1,0
1,0.18,1,0,0,0.2080,0,0,1
1,0.26,0,0,1,0.2920,0,0,1
0,0.28,1,0,0,0.3640,0,0,1
0,0.59,0,0,1,0.6940,0,0,1

Drop-first training data:

# people_train_drop_first.txt
# sex (0 = male, 1 = female) - dependent variable
# age (div 100),
# state (michigan = 00, nebraska = 10, oklahoma = 01),
# income (div $100,000),
# politics type (conservative, moderate, liberal)
#
1,0.24,0,0,0.2950,0,1
0,0.39,0,1,0.5120,1,0
1,0.63,1,0,0.7580,0,0
0,0.36,0,0,0.4450,1,0
1,0.27,1,0,0.2860,0,1
1,0.50,1,0,0.5650,1,0
1,0.50,0,1,0.5500,1,0
0,0.19,0,1,0.3270,0,0
1,0.22,1,0,0.2770,1,0
0,0.39,0,1,0.4710,0,1
1,0.34,0,0,0.3940,1,0
0,0.22,0,0,0.3350,0,0
1,0.35,0,1,0.3520,0,1
0,0.33,1,0,0.4640,1,0
1,0.45,1,0,0.5410,1,0
1,0.42,1,0,0.5070,1,0
0,0.33,1,0,0.4680,1,0
1,0.25,0,1,0.3000,1,0
0,0.31,1,0,0.4640,0,0
1,0.27,0,0,0.3250,0,1
1,0.48,0,0,0.5400,1,0
0,0.64,1,0,0.7130,0,1
1,0.61,1,0,0.7240,0,0
1,0.54,0,1,0.6100,0,0
1,0.29,0,0,0.3630,0,0
1,0.50,0,1,0.5500,1,0
1,0.55,0,1,0.6250,0,0
1,0.40,0,0,0.5240,0,0
1,0.22,0,0,0.2360,0,1
1,0.68,1,0,0.7840,0,0
0,0.60,0,0,0.7170,0,1
0,0.34,0,1,0.4650,1,0
0,0.25,0,1,0.3710,0,0
0,0.31,1,0,0.4890,1,0
1,0.43,0,1,0.4800,1,0
1,0.58,1,0,0.6540,0,1
0,0.55,1,0,0.6070,0,1
0,0.43,1,0,0.5110,1,0
0,0.43,0,1,0.5320,1,0
0,0.21,0,0,0.3720,0,0
1,0.55,0,1,0.6460,0,0
1,0.64,1,0,0.7480,0,0
0,0.41,0,0,0.5880,1,0
1,0.64,0,1,0.7270,0,0
0,0.56,0,1,0.6660,0,1
1,0.31,0,1,0.3600,1,0
0,0.65,0,1,0.7010,0,1
1,0.55,0,1,0.6430,0,0
0,0.25,0,0,0.4030,0,0
1,0.46,0,1,0.5100,1,0
0,0.36,0,0,0.5350,0,0
1,0.52,1,0,0.5810,1,0
1,0.61,0,1,0.6790,0,0
1,0.57,0,1,0.6570,0,0
0,0.46,1,0,0.5260,1,0
0,0.62,0,0,0.6680,0,1
1,0.55,0,1,0.6270,0,0
0,0.22,0,1,0.2770,1,0
0,0.50,0,0,0.6290,0,0
0,0.32,1,0,0.4180,1,0
0,0.21,0,1,0.3560,0,0
1,0.44,1,0,0.5200,1,0
1,0.46,1,0,0.5170,1,0
1,0.62,1,0,0.6970,0,0
1,0.57,1,0,0.6640,0,0
0,0.67,0,1,0.7580,0,1
1,0.29,0,0,0.3430,0,1
1,0.53,0,0,0.6010,0,0
0,0.44,0,0,0.5480,1,0
1,0.46,1,0,0.5230,1,0
0,0.20,1,0,0.3010,1,0
0,0.38,0,0,0.5350,1,0
1,0.50,1,0,0.5860,1,0
1,0.33,1,0,0.4250,1,0
0,0.33,1,0,0.3930,1,0
1,0.26,1,0,0.4040,0,0
1,0.58,0,0,0.7070,0,0
1,0.43,0,1,0.4800,1,0
0,0.46,0,0,0.6440,0,0
1,0.60,0,0,0.7170,0,0
0,0.42,0,0,0.4890,1,0
0,0.56,0,1,0.5640,0,1
0,0.62,1,0,0.6630,0,1
0,0.50,0,0,0.6480,1,0
1,0.47,0,1,0.5200,1,0
0,0.67,1,0,0.8040,0,1
0,0.40,0,1,0.5040,1,0
1,0.42,1,0,0.4840,1,0
1,0.64,0,0,0.7200,0,0
0,0.47,0,0,0.5870,0,1
1,0.45,1,0,0.5280,1,0
0,0.25,0,1,0.4090,0,0
1,0.38,0,0,0.4840,0,0
1,0.55,0,1,0.6000,1,0
0,0.44,0,0,0.6060,1,0
1,0.33,0,0,0.4100,1,0
1,0.34,0,1,0.3900,1,0
1,0.27,1,0,0.3370,0,1
1,0.32,1,0,0.4070,1,0
1,0.42,0,1,0.4700,1,0
0,0.24,0,1,0.4030,0,0
1,0.42,1,0,0.5030,1,0
1,0.25,0,1,0.2800,0,1
1,0.51,1,0,0.5800,1,0
0,0.55,1,0,0.6350,0,1
1,0.44,0,0,0.4780,0,1
0,0.18,0,0,0.3980,0,0
0,0.67,1,0,0.7160,0,1
1,0.45,0,1,0.5000,1,0
1,0.48,0,0,0.5580,1,0
0,0.25,1,0,0.3900,1,0
0,0.67,0,0,0.7830,1,0
1,0.37,0,1,0.4200,1,0
0,0.32,0,0,0.4270,1,0
1,0.48,0,0,0.5700,1,0
0,0.66,0,1,0.7500,0,1
1,0.61,0,0,0.7000,0,0
0,0.58,0,1,0.6890,1,0
1,0.19,0,0,0.2400,0,1
1,0.38,0,1,0.4300,1,0
0,0.27,0,0,0.3640,1,0
1,0.42,0,0,0.4800,1,0
1,0.60,0,0,0.7130,0,0
0,0.27,0,1,0.3480,0,0
1,0.29,1,0,0.3710,0,0
0,0.43,0,0,0.5670,1,0
1,0.48,0,0,0.5670,1,0
1,0.27,0,1,0.2940,0,1
0,0.44,0,0,0.5520,0,0
1,0.23,1,0,0.2630,0,1
0,0.36,1,0,0.5300,0,1
1,0.64,0,1,0.7250,0,0
1,0.29,0,1,0.3000,0,1
0,0.33,0,0,0.4930,1,0
0,0.66,1,0,0.7500,0,1
0,0.21,0,1,0.3430,0,0
1,0.27,0,0,0.3270,0,1
1,0.29,0,0,0.3180,0,1
0,0.31,0,0,0.4860,1,0
1,0.36,0,1,0.4100,1,0
1,0.49,1,0,0.5570,1,0
0,0.28,0,0,0.3840,0,0
0,0.43,0,1,0.5660,1,0
0,0.46,1,0,0.5880,1,0
1,0.57,0,0,0.6980,0,0
0,0.52,0,1,0.5940,1,0
0,0.31,0,1,0.4350,1,0
0,0.55,0,0,0.6200,0,1
1,0.50,0,0,0.5640,1,0
1,0.48,1,0,0.5590,1,0
0,0.22,0,1,0.3450,0,0
1,0.59,0,1,0.6670,0,0
1,0.34,0,0,0.4280,0,1
0,0.64,0,0,0.7720,0,1
1,0.29,0,1,0.3350,0,1
0,0.34,1,0,0.4320,1,0
0,0.61,0,0,0.7500,0,1
1,0.64,0,1,0.7110,0,0
0,0.29,0,0,0.4130,0,0
1,0.63,1,0,0.7060,0,0
0,0.29,1,0,0.4000,0,0
0,0.51,0,0,0.6270,1,0
0,0.24,0,1,0.3770,0,0
1,0.48,1,0,0.5750,1,0
1,0.18,0,0,0.2740,0,0
1,0.18,0,0,0.2030,0,1
1,0.33,1,0,0.3820,0,1
0,0.20,0,1,0.3480,0,0
1,0.29,0,1,0.3300,0,1
0,0.44,0,1,0.6300,0,0
0,0.65,0,1,0.8180,0,0
0,0.56,0,0,0.6370,0,1
0,0.52,0,1,0.5840,1,0
0,0.29,1,0,0.4860,0,0
0,0.47,1,0,0.5890,1,0
1,0.68,0,0,0.7260,0,1
1,0.31,0,1,0.3600,1,0
1,0.61,1,0,0.6250,0,1
1,0.19,1,0,0.2150,0,1
1,0.38,0,1,0.4300,1,0
0,0.26,0,0,0.4230,0,0
1,0.61,1,0,0.6740,0,0
1,0.40,0,0,0.4650,1,0
0,0.49,0,0,0.6520,1,0
1,0.56,0,0,0.6750,0,0
0,0.48,1,0,0.6600,1,0
1,0.52,0,0,0.5630,0,1
0,0.18,0,0,0.2980,0,0
0,0.56,0,1,0.5930,0,1
0,0.52,1,0,0.6440,1,0
0,0.18,1,0,0.2860,1,0
0,0.58,0,0,0.6620,0,1
0,0.39,1,0,0.5510,1,0
0,0.46,0,0,0.6290,1,0
0,0.40,1,0,0.4620,1,0
0,0.60,0,0,0.7270,0,1
1,0.36,1,0,0.4070,0,1
1,0.44,0,0,0.5230,1,0
1,0.28,0,0,0.3130,0,1
1,0.54,0,1,0.6260,0,0

Drop-first test data:

# people_test_drop_first.txt
#
0,0.51,0,0,0.6120,1,0
0,0.32,1,0,0.4610,1,0
1,0.55,0,0,0.6270,0,0
1,0.25,0,1,0.2620,0,1
1,0.33,0,1,0.3730,0,1
0,0.29,1,0,0.4620,0,0
1,0.65,0,0,0.7270,0,0
0,0.43,1,0,0.5140,1,0
0,0.54,1,0,0.6480,0,1
1,0.61,1,0,0.7270,0,0
1,0.52,1,0,0.6360,0,0
1,0.30,1,0,0.3350,0,1
1,0.29,0,0,0.3140,0,1
0,0.47,0,1,0.5940,1,0
1,0.39,1,0,0.4780,1,0
1,0.47,0,1,0.5200,1,0
0,0.49,0,0,0.5860,1,0
0,0.63,0,1,0.6740,0,1
0,0.30,0,0,0.3920,0,0
0,0.61,0,1,0.6960,0,1
0,0.47,0,1,0.5870,1,0
1,0.30,0,1,0.3450,0,1
0,0.51,0,1,0.5800,1,0
0,0.24,0,0,0.3880,1,0
0,0.49,0,0,0.6450,1,0
1,0.66,0,1,0.7450,0,0
0,0.65,0,0,0.7690,0,0
0,0.46,1,0,0.5800,0,0
0,0.45,0,1,0.5180,1,0
0,0.47,0,0,0.6360,0,0
0,0.29,0,0,0.4480,0,0
0,0.57,0,1,0.6930,0,1
0,0.20,0,0,0.2870,0,1
0,0.35,0,0,0.4340,1,0
0,0.61,0,1,0.6700,0,1
0,0.31,0,1,0.3730,1,0
1,0.18,0,0,0.2080,0,1
1,0.26,0,1,0.2920,0,1
0,0.28,0,0,0.3640,0,1
0,0.59,0,1,0.6940,0,1


											
This entry was posted in Machine Learning, Scikit. Bookmark the permalink.

Leave a Reply