Meta algorithms
Bagging, boosting, stacking all are so-called ''meta-algorithms'': Approach to combine serval machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting) or improve the predictive force(stacking alias ensemble).
- Bagging (stands for Bootstrap Aggregation, parallel ensemble) is the way decrease the variance of your prediction by generating additional data for training from your original dataset using combinations with repetitions to produce multisets of the same cardinality/size as your original data(samples are drawn with replacement). By increasing the size of your training set you can't improve the model predictive force, but just decrease the variance and helps to avoid overfitting, narrowly tuning the prediction to expected outcome.
- example: random forest
- Boosting(sequential ensemble) is a two-step approach, where one first uses subsets of the original data to produce a series of averagely performing models and then "boosts" their performance by combining them together using a particular cost function (=majority vote). Unlike bagging, in the classical boosting the subset creation is not random and depends upon the performance of the previous models: every new subsets contains the elements that were (likely to be) misclassified by previous models.
- example: Aaboost
- Stacking is a similar to boosting: you also apply several models to your original data. The difference here is, however, that you don't have just an empirical formula for your weight function, rather you introduce a meta-level and use another model/approach to estimate the input together with outputs of every model to estimate the weights or, in other words, to determine what models perform well and what badly given these input data.
[1] https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning
Bagging
Given a standard training set of size n, bagging generates m new training sets , each of size n′, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each . If , then for large n the set is expected to have the fraction of the unique examples of D, the rest being duplicates This kind of sample is known as a bootstrap sample. The models are fitted using the above bootstrap samples and combined by averaging the output (for regression) or voting (for classification)
[1] https://en.wikipedia.org/wiki/Bootstrap_aggregating
Boosting
Incrementally building an ensemble by training each new model instance to emphasize the training instances that previously mis-classified. In some cases, boosting has been shown to yield better accuracy than bagging, but it also tends to be more likely to over-fit the training data
Most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data are reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoost). Thus, future weak learners focus more on the examples that previous weak learners misclassified.