This post describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated, and where it is useful. In applications that require good interpretability of the model, Decision Trees (DTs) work very well especially if they are of small depth. However, DTs with real-world datasets can have large depths. Higher depth DTs are more prone to overfitting and thus lead to higher variance in the model. This shortcoming of DT is explored by the Random Forest model. In the Random Forest model, the original training data is randomly sampled with replacement generating small subsets of data (see the image below). These subsets are also known as bootstrap samples. These bootstrap samples are then fed as training data to many DTs of large depths. Each of these DTs is trained separately on these bootstrap samples. This aggregation of DTs is called the Random Forest ensemble. The concluding result of the ensemble model is determined by counting a majority vote from all the DTs. This concept is known as Bagging or Bootstrap Aggregation. Since each DT takes a different set of training data as input, the deviations in the original training dataset do not impact the final result obtained from the aggregation of DTs. Therefore, bagging as a concept reduces variance without changing the bias of the complete ensemble.
One of the best interpretable models used for supervised learning is Decision Trees, where the algorithm makes decisions and predicts the values using an if-else condition, as shown in the example.
Though, Decision trees are easy to understand and in interpretations. One major issue with the decision tree is:
Hence, to have the best of both worlds, that is less variance and more interpretability, the algorithm of Random Forest was introduced. Random Forests or Random Decision Forests are ensemble learning methods for classification and regression problems that operate by constructing a multitude of independent decision trees(using bootstrapping) at training time and outputting the majority prediction from all the trees as the final output. Constructing many decision trees in a Random Forest algorithm helps the model to generalize the data pattern rather than learn the data pattern and therefore, reduce the variance (reduce overfitting).
But, how do select a training set for every new decision tree made in a Random Forest? This is where Bootstrapping kicks in!
New training sets for multiple decision trees in Random Forest are made using the concept of Bootstrapping, which is basically random sampling with replacement. Let us look at an example to understand how bootstrapping works:
Here, the main training dataset consists of five animals, and now to make different samples out of this one main training set.
Note: Random forest bootstraps both data points and features while making multiple independent decision trees.
In our above example, we can observe that some animals are repeated while making the sample and some animals did not even occur once in the sample.
Here, Sample1 does not have Rat and Cow whereas sample 3 had all the animals equal to the main training set. While making the samples, data points were chosen randomly and with replacement, and the data points which fail to be a part of that particular sample are known as Out-of-Bag points.
So, where does oob_score come into the picture? oob_score is a very powerful validation technique used especially for the Random Forest algorithm for least variance results.
Note: While using the cross-validation technique, every validation set has already been seen or used in training by a few decision trees and hence there is a leakage of data, therefore more variance. But, oob_score prevents leakage and gives a better model with low variance, so we use oob_score for validating the model.
Let’s understand oob_score through an example. Here, we have a training set with 5 rows and a classification target variable of whether the animals are domestic/pet.
Out of multiple decision trees built in the random forest, a bootstrapped sample for one particular decision tree, say DT_1 is shown below:
Here, Rat and Cat data have been left out. And since, Rat and Cat are OOB for DT_1, we would predict the values for Rat and Cat using DT_1. (Note: Data of Rat and Cat hasn’t been seen by DT_1 while training the tree.)
Just like DT_1, there would be many more decision trees where either rat or cat was left out or maybe both of them were left out. Say 3rd, 7th, and 100th decision trees also had Rat as an OOB datapoint, which means “Rat” data wasn’t seen by any of them, before predicting the value for Rat. So, we recorded all the predicted values for “Rat” from the trees DT_1, Dt_3, DT_7, and DT_100. And see that aggregated/majority prediction is the same as the actual value for “Rat”. Note that None of the models had seen data before, and still predicted the values for a data point correctly.
Similarly, every data point is passed for prediction to trees where it would be behaving as OOB and an aggregated prediction is recorded for each row.
The OOB Score is computed as the number of correctly predicted rows from the out-of-bag sample.
OOB Error is the number of wrongly classifying the OOB Sample.
What is the Out of Bag score in Random Forests?
Out of bag (OOB) score is a way of validating the Random forest model. Below is a simple intuition of how is it calculated followed by a description of how it is different from the validation score and where it is advantageous.
For the description of OOB score calculation, let’s assume there are five DTs in the random forest ensemble labeled from 1 to 5. For simplicity, suppose we have a simple original training data set as below.
Let the first bootstrap sample is made of the first three rows of this data set as shown in the green box below. This bootstrap sample will be used as the training data for the DT “1”.
Then the last row that is “left out” in the original data (see the red box in the image below) is known as the Out of Bag sample. This row will not be used as the training data for DT 1. Please note that in reality there will be several such rows that are left out as Out of Bag, here for simplicity only one is shown.
After the DTs models have been trained, this leftover row or the OOB sample will be given as unseen data to DT 1. The DT 1 will predict the outcome of this row. Let DT 1 predicts this row correctly as “YES”. Similarly, this row will be passed through all the DTs that did not contain this row in their bootstrap training data. Let’s assume that apart from DT 1, DT 3 and DT 5 also did not have this row in their bootstrap training data. The predictions of this row by DT 1, 3, and 5 are summarized in the table below.
We see that by a majority vote of 2 “YES” vs 1 “NO” the prediction of this row is “YES”. It is noted that the final prediction of this row by majority vote is a correct prediction since originally in the “Play Tennis” column of this row is also a “YES”.
Similarly, each of the OOB sample rows is passed through every DT that did not contain the OOB sample row in its bootstrap training data and a majority prediction is noted for each row. And lastly, the OOB score is computed as the number of correctly predicted rows from the out-of-bag sample.
What is the difference between the OOB score and validation score?
Since we have understood how the OOB score is estimated let’s try to comprehend how it differs from the validation score. As compared to the validation score OOB score is computed on data that was not necessarily used in the analysis of the model. Whereas for calculation validation score, a part of the original training dataset is actually set aside before training the models. Additionally, the OOB score is calculated using only a subset of DTs not containing the OOB sample in their bootstrap training dataset. While the validation score is calculated using all the DTs of the ensemble.
Where can OOB score be useful?
As noted above, only a subset of DTs is used for determining the OOB score. This leads to reducing the overall aggregation effect in bagging. Thus in general, validation on a full ensemble of DTs is better than a subset of DTs for estimating the score. However, occasionally the dataset is not big enough, and hence setting aside a part of it for validation is unaffordable. Consequently, in cases where we do not have a large dataset and want to consume it all as the training dataset, the OOB score provides a good trade-off. Nonetheless, it should be noted that the validation score and OOB score are unalike, computed in a different manner, and should not be thus compared. In an ideal case, about 36.8 % of the total training data forms the OOB sample. This can be shown as follows.
If there are N rows in the training data set. Then, the probability of not picking a row in a random draw is
Using sampling-with-replacement the probability of not picking N rows in random draws is
which in the limit of large N becomes equal to
Therefore, about 36.8 % of total training data are available as OOB samples for each DT and hence it can be used for evaluating or validating the random forest model.
Random Forest can be a very powerful technique for predicting better values if we use the OOB_Score technique. Even if OOB_Score takes a bit more time but the predictions are worth the time consumed in training the random forest model with the OOB_Score parameter set as True.