A Demonstration that AdaBoost Classification Training is Deterministic

One afternoon at work, while getting coffee in our kitchen area, a colleague and I were discussing the AdaBoost binary classification algorithm. AdaBoost is a powerful binary classification algorithm. My colleague asked me if the order of the training data made any difference. Put another way, is AdaBoost training deterministic?

The answer is, “yes”, AdaBoost training is deterministic. This is a very nice characteristic that some machine learning classification algorithms do not have.

To demonstrate this training determinism characteristic, I put together a demo. First, I created a demo dataset that looks like:

 1, 24, 0, 29500, 2
-1, 39, 2, 51200, 1
 1, 63, 1, 75800, 0
-1, 36, 0, 44500, 1
 1, 27, 1, 28600, 2
. . .

Each line of data represents a person. The fields are sex (M = -1, F = 1), age, State (Michigan = 0, Nebraska = 1, Oklahoma = 2), income, political leaning (conservative = 0, moderate = 1, liberal = 2). The goal is to predict sex from the other variables.

I created a second data set where the order of the columns is switched from (sex, age, State, income, politics) to (sex, income, politics, age, state):

 1, 29500, 2, 24, 0
-1, 51200, 1, 39, 2
 1, 75800, 0, 63, 1
-1, 44500, 1, 36, 0
 1, 28600, 2, 27, 1
. . .

I ran an AdaBoost demo on both datasets. I used an artificially small number of weak learners of 5 to keep things simple. The resulting models were identical:

Begin AdaBoost classification demo

Loading People Dataset train (200) and test (40)

Order: sex_age_state_income_politics

Creating AdaBoost model with nLearners = 5
Done

Starting training
Done

alpha weights:
0.1409 0.1925 0.1636 0.1472 0.1462

Computing model accuracy
Accuracy on training data = 0.6150
Accuracy on test data = 0.4500

End demo

and the second model:

Begin AdaBoost classification demo

Loading People Dataset train (200) and test (40)

Order: sex_income_politics_age_state

Creating AdaBoost model with nLearners = 5
Done

Starting training
Done

alpha weights:
0.1409 0.1925 0.1636 0.1472 0.1462

Computing model accuracy
Accuracy on training data = 0.6150
Accuracy on test data = 0.4500

End demo

Additionally, I switched the order of the rows of the training data and also got an identical model.

I walked though my AdaBoost source code to make sure I understood exactly why AdaBoost training is deterministic. It would take too long to explain the details, but briefly, each of the model weak learner stumps is constructed based on the error (sum of internal weights of incorrectly predicted items) of all training data, and so the order in which the training data (both rows and columns) is processed doesn’t matter. (This crude explanation leaves out a lot of important detail).

Training determinism is somewhat rare in ML. For example, neural network classification, decision tree classification, and logistic regression are all quite sensitive to the order of the training data.

An interesting exploration.

Left: The Saturn V rocket brought the first men to the moon in Apollo 11 in 1969. The booster stage was powered by five massive F1 engines. Right: Some of the men who achieved this amazing feat. I think that’s Wernher von Braun in the lower right (longish hair). It looks like, in those days, there was no room for irrational hiring policies. I’m absolutely certain that everyone in that photo was hired in a deterministic way, on the basis of ability.

2 Responses to A Demonstration that AdaBoost Classification Training is Deterministic

Fernando F says:

October 3, 2024 at 4:03 pm

Out of curiosity I’ve tried to find out whether XGBoost and LightGBM were deterministic.

It appears that XGBoost is deterministic if you use the default parameter subsample = 1, but can be non-deterministic if you specify a different value.
https://xgboost.readthedocs.io/en/stable/parameter.html

LightGBM seems to be non-deterministic by default and can be switched to deterministic as per the parameters deterministic=true, force_col_wise=true, force_row_wise=true
https://lightgbm.readthedocs.io/en/latest/Parameters.html

I appreciate your articles as always!

Loading...

jamesdmccaffrey says:

October 4, 2024 at 1:18 am

Thank you for you comment — interesting about XGBoost. And I love the picture of your dog — very cool.

Loading...