# GBDT

Written by    14:22 November 1, 2018

### Bossting Tree

$$\mathcal{f}_M(x)=\sum_{m=1}^{M}T(x;\theta_{m})$$

$$T(x;\theta_{m})$$表示决策树，$$\theta_{m}$$是决策树的参数，$$M$$是树的个数。

$$f_{m}(x)=f_{m-1}(x)+T(x;\theta_{m})$$

$$f_{m-1}(x)$$是当前模型，通过最小化的方法确定下一棵决策树的参数$$\theta_{m}$$：

$$\hat{\theta_{m}}=\mathop{\arg\min}_{\theta_{m}} \sum_{i=1}^{N}\mathcal{L}(y_{i}, f_{m-1}(x_{i})+T(x;\theta_{m}))$$

$$\mathcal{L}(y,f(x))=(y-f(x))^2$$

\begin{aligned} \mathcal{L}(y,f_{m-1}(x)+T(x;\theta_{m})) &= [y-f_{m-1}(x)-T(x;\theta_{m})]^2 \\ &= [r-T(x;\theta_{m})]^2\end{aligned}

$$r=y-f_{m-1}(x)$$

$$-[\frac{\partial L(y, f(x_i))}{\partial f(x_i)}]_{f(x)=f_{m-1}(x)}$$

1. 初始化: $$f_{0}(x) = \mathop{\arg\min}_c \sum^{N}_{I=1}L(y_i, c)$$
2. 对$$m=1, 2, …, M$$
(a). 对$$i=1,2, … ,N$$，计算$$r_{mi}=-[\frac{\partial L(y, f(x_i))}{\partial f(x_i)}]_{f(x)=f_{m-1}(x)}$$
(b). 对$$r_{mi}$$拟合一个回归树，得到第m棵树的叶结点区域$$R_{mj}$$，$$j=1,2, …, \mathcal{J}$$
(c). 对$$j=1,2, …, \mathcal{J}$$，计算$$c_{mj}=\mathop{\arg\min}_{c} \sum_{x_{i} \in R_{mj}} L(y_i, f_{m-1}(x_i)+c)$$
(d) 更新$$f_{m}(x) = f_{m-1}(x)+\sum^{\mathcal{J}}_{j=1} c_{mj}I(x \in R_{mj})$$
3. 得到回归树
$$\hat{f}(x) = f_{M}(x) = \sum^{M}_{m=1} \sum^{J}_{j=1} c_{mj} I(x \in R_{mj})$$

### XGBoost

XGBoost则是GBDT的一个优化版实现，主要优化点如下：

1. 传统GBDT在loss里面只用到了一阶导，XGBoost则在loss里面基于泰勒展开用到了二阶导，用了牛顿法来迭代。