DSPython Logo DSPython

Ensemble Learning

Combine multiple simple models to build powerful and robust algorithms like Random Forest and Gradient Boosting.

Machine Learning Intermediate 75 min

Topic 1: What is Ensemble Learning?

Ensemble Learning is based on the "wisdom of the crowd" principle. Instead of relying on a single, complex model, an ensemble combines the predictions from several "weaker" (simpler) models to make a final, more accurate and robust prediction.



A single model (like a deep Decision Tree) might be a brilliant expert who overfits. An ensemble is like a committee of diverse experts. Even if some are wrong, the majority vote (for classification) or their average opinion (for regression) is usually correct.

Why use an Ensemble?

  • Higher Accuracy: Ensembles almost always outperform any single model.
  • More Robust: They are less sensitive to noise and outliers in the data.
  • Reduces Overfitting: By combining many models, they "average out" the individual models' tendencies to overfit.

The Two Main Strategies:

  1. Bagging (Parallel): Build many models independently on different subsets of data. (e.g., Random Forest)
  2. Boosting (Sequential): Build models one after another, where each new model learns from the *mistakes* of the previous one. (e.g., Gradient Boosting)

Topic 2: Bagging & Random Forest

Bagging stands for Bootstrap Aggregating.

  1. Bootstrap: Create many random samples (e.g., 100) of your training data. These samples are drawn *with replacement*, so one row of data might appear 3 times in one sample and 0 times in another.
  2. Aggregate: Train one model (e.g., a Decision Tree) on each of the 100 samples.
  3. Vote: To make a new prediction, you ask all 100 trees for their opinion. The final answer is the one that gets the most "votes."

Random Forest: Bagging + Feature Randomness

A **Random Forest** is the most popular Bagging model. It's an ensemble of Decision Trees with one extra trick:

When each tree is being built, at every split-point, it is only allowed to consider a *random subset of features*. For example, instead of checking all 4 Iris features, it might be forced to choose the best split from only 2 random features.

This "double randomness" (random samples + random features) makes the trees in the forest very different from each other. This **diversity** is the key to reducing overfitting and making the ensemble powerful.

Key Concept: Reduces Variance

Bagging and Random Forests are fantastic at reducing **variance** (overfitting). A single Decision Tree will change wildly if you change a few data points. A Random Forest of 500 trees will barely change at all, making it very stable.


Topic 3: Boosting (AdaBoost & Gradient Boosting)

Boosting is a sequential process. Instead of models working in parallel, they work in a chain, with each model learning from the last one's failures.

Analogy: A Team of Specialists

Imagine you're training for a test.

  1. You build a "weak" model (Model 1). It gets 70% of the questions right, but struggles with a specific topic.
  2. You build Model 2, but you tell it: "Pay *extra attention* to the questions Model 1 got wrong."
  3. Model 2 becomes a specialist in that topic.
  4. You build Model 3, telling it to focus on the questions that *both* Model 1 and Model 2 *still* get wrong.

The final prediction is a "weighted vote" that gives more power to the models that performed best.

Key Types of Boosting:

  • AdaBoost (Adaptive Boosting): The classic example. It re-weights the data points at each step, making the "hard" (misclassified) points "heavier" so the next model has to focus on them.
  • Gradient Boosting (GBM, XGBoost): A more modern and powerful version. Instead of re-weighting data, it trains the next model to predict the "residual error" (the amount by which the previous model was wrong). This is the foundation for state-of-the-art models like XGBoost and LightGBM.

Key Concept: Reduces Bias

Boosting is fantastic at reducing **bias** (underfitting). By sequentially focusing on errors, it can build an incredibly strong, accurate "super-model" even from very simple "weak learners" (like trees with a depth of 1!).


Topic 4: Key Parameters in `sklearn`

When you use ensemble models, you'll see these parameters most often:

# For Random Forest (Bagging)
model = RandomForestClassifier(
    # The number of trees in the forest.
    # More is usually better, but takes longer. 100 is a good start.
    n_estimators=100,
    
    # Pruning parameters to control the individual trees (like max_depth).
    # This helps prevent overfitting of the *ensemble* itself.
    max_depth=10,

    # Sets the random_state for reproducible results.
    random_state=42
)

# For Gradient Boosting (Boosting)
model = GradientBoostingClassifier(
    # The number of sequential models to build.
    n_estimators=100,

    # This is the most important tuning parameter!
    # It controls how much each new model *corrects* the previous one.
    # A small value (e.t., 0.01-0.1) requires more estimators but is more robust.
    learning_rate=0.1,

    # Controls the depth of the weak learners.
    max_depth=3,
    
    random_state=42
)