Categories: Machine Learning

Introduction to Ensemble Techniques: Bagging and Boosting

Ensemble Techniques are Machine Learning techniques that combine predictions from several models to give an optimal model. Several models are trained using the same algorithm. This increases accuracy and improves robustness over a single model. By combining several models, ensemble techniques also help reduce the differences caused by bias and variance factors of the model, i.e., reducing underfitting and overfitting.

In this article, we will look at two Ensemble techniques – Bagging and Boosting. Before moving on them let’s review Bootstrapping, which is useful to understand Bagging and Boosting.

Bootstrapping

Bootstrapping consists of a random sampling of small subsets of data from the dataset with replacement. The sample sizes are small and the samples are selected from the dataset with equal probability. This gives a better understanding of the mean, variance, and standard deviation from the dataset. The estimated mean of the dataset can be achieved by calculating the mean of all such samples created.

Bagging

Bagging is the short form for Bootstrap Aggregation. This is generally used in case of high-variance data (generally decision trees). It is used to reduce the variance of the model. This is done by choosing random training samples from the dataset with replacement and then creating several subsets of data from that, which are then used to train their models.

So, for a model with some observations and features, first Bootstrapping is done. A model is created with a sample of those observations and a subset of features. Those features are selected which give the best split. The same process is used to create several models. These models are parallely trained and average (aggregation) of predictions from all the models is taken.

Boosting

Boosting is used to minimize the bias error in the model. This ensemble technique converts weak learners into strong learners. This is achieved by utilizing weighted averages and sequentially improving on the previous classification.

In each iteration for this technique, the weights of the data points that are wrongly classified or predicted are increased. Hence, the next learner in the sequence tries to get those points correct. Thus, the Boosting technique creates random datasets sequentially from the previous data received.

There are several types of Boosting algorithms. Some of them are:

  • Gradient Boosting
  • AdaBoost
  • XGBoost

Bagging Vs. Boosting

The decision to choose Bagging or Boosting totally depends on the data. For example if the original data model is facing the problem of overfitting, then it is best to go with Bagging. If we want our model to have better bias, then we can go for the Boosting technique. Both these algorithms lead to better stability and robustness as compared to the original model.

Summary

In this article, we looked at the ensemble techniques – Bagging and Boosting. In the next article, we will focus on Outlier Detection using Isolation Forests.

Pallavi Pandey

Recent Posts

MapReduce Algorithm

In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…

8 months ago

Linear Programming using Pyomo

Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…

1 year ago

Networking and Professional Development for Machine Learning Careers in the USA

In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…

1 year ago

Predicting Employee Churn in Python

Analyze employee churn, Why employees are leaving the company, and How to predict, who will…

2 years ago

Airflow Operators

Airflow operators are core components of any workflow defined in airflow. The operator represents a…

2 years ago

MLOps Tutorial

Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…

2 years ago