The follwing slides have been generated by Google Gemini
They are intended to give a rough overview
The formulas presented are just for reference
max_depth
, min_samples_leaf
) or post-pruning techniques (Cost-Complexity Pruning, Reduced Error Pruning).Algorithm | Primary Mechanism | Focus/Benefit | Multicollinearity Handling | When to Use |
---|---|---|---|---|
Ordinary Least Squares (OLS) | Minimizes RSS (L2 Loss) | BLUE Estimates, Maximum Interpretability | Poor, highly sensitive to collinearity | Data is small, linear assumptions hold, interpretation of coefficients is critical. |
Lasso Regression | L1 Regularization (Absolute Sum) | Feature Selection (induces sparsity) | Poor, arbitrarily selects one correlated feature | High-dimensional data where few features are relevant; seeking a sparse model. |
Elastic Net | L1 + L2 Combination | Stability and Feature Grouping | Excellent, groups correlated features | Datasets with high dimensions and significant multicollinearity. |
Algorithm | Core Principle | Primary Error Focus | Key Trade-off | Applicability Context |
---|---|---|---|---|
Support Vector Machines (SVM) | Maximum Margin Hyperplane (Kernel Trick) | Reduction of Error / Optimization | High Training Complexity ($O(n^2)$) vs. High Accuracy | Medium-sized, high-dimensional data; superior classification when clear margin separation exists. |
Nearest Neighbors (k-NN) | Classification based on $k$ closest neighbors | Local Data Structure is Predictive | Zero Training Cost vs. Slow Prediction Time ($O(N)$) | Simple, low-latency problems where updating the model is frequent; small datasets. |
Naive Bayes (NB) | Computes posterior probability via Bayes' Theorem | Speed and Efficiency | Strong Independence Assumption vs. Extremely Fast Performance | Text classification, spam filtering, and high-dimensional categorical/discrete data. |
Decision Tree (DT) | Recursive Feature Space Partitioning | High Variance / Overfitting | High Interpretability (White-Box) vs. Low Predictive Stability | When model interpretability is the paramount constraint; use as base learner for ensembles. |
Random Forest (RF) | Bagging (Parallel Ensemble) | Reduction of Variance | High Stability/Robustness vs. Reduced Interpretability; Fast Training | Default, robust choice for high-performance and stable results. |
Gradient Boosting (GB) | Boosting (Sequential Ensemble) | Reduction of Bias | Maximum Accuracy vs. High Computational Cost and Tuning Sensitivity | When the absolute highest predictive accuracy is required from structured data. |
The choice of algorithm depends on balancing core constraints: accuracy, interpretability, data size, and computational feasibility.