Gradient Boosting Decision Tree
 
Visualization
A great visualization and playground of Decision Tree and Gradient Boosting could be found here
Regression Decision Tree
Waiting to be written.
Boosting Decision Tree
Waiting to be written.
Gradient Boosting Decision Tree
Gradient boosting builds an ensemble of trees one-by-one, then the predictions of the individual trees are summed:
     D(x) = d_tree1(x) + d_tree2(x) + ...
 The next decision tree tries to cover the discrepancy between the target function f(x) and 
 the current ensemble prediction by reconstructing the residual.
 For example, if an ensemble has 3 trees the prediction of that ensemble is:
     D(x) = d_tree1(x) + d_tree2(x) + d_tree3(x)
 The next tree tree_4 in the ensemble should complement well the existing trees and 
 minimize the training error of the ensemble. In the ideal case we'd be happy to have:
     D(x) + d_tree4(x) = f(x)
To get a bit closer to the destination, we train a tree to reconstruct the difference between the target function and the current predictions of an ensemble, which is called the residual:
     R(x) = f(x) - D(x)
 Did you notice? If decision tree completely reconstructs R(x), the whole ensemble gives predictions 
 without errors (after adding the newly-trained tree to the ensemble)! That said, in practice 
 this never happens, so we instead continue the iterative process of ensemble building.
Source: http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html