單一決策樹

$$分割M-1次，得到M個節點R_m，反應變數可為有K個水準的分類變數或連續變數$$

節點不純度(Impurity)

Classification error rate

$$\begin{aligned} E_m &= \frac{\text{# misclassified}}{\text{# samples in } R_m} \\ &= 1 - \max_k(\hat{p_{mk}}) \\ \text{(因為最大機率對應的 } &k \text{ 就是 } \hat{Y}_m\text{，所以誤分類率是 } 1 - \text{分到最大的)} \end{aligned}$$

Gini index

$$\begin{aligned} G_m &= \sum_{k=1}^K \hat{p_{mk}} (1 - \hat{p_{mk}}) \\ &= 1 - \sum_{k=1}^K \hat{p_{mk}}^2 \\ &\quad \text{(若 K = 2，則 } G_m = 2\hat{p_{m1}} (1 - \hat{p_{m1}})\text{)} \end{aligned}$$

Cross-entropy

$$D_m = -\sum_{k=1}^K \hat{p_{mk}} \operatorname{ln}(\hat{p_{mk}})$$

整體不純度

$$\begin{aligned} I_{\text{overall}} &= \sum_{m=1}^M \frac{n_m}{N} \cdot I(R_m) \\ E_{\text{overall}} &= \frac{\text{# misclassified}}{N} \\ D_{\text{overall}} &= -\frac{1}{N}\sum_{m=1}^M\sum_{k=1}^K n_{mk}\operatorname{ln} (\hat{p_{mk}}) \\ Deviance&=-2\sum^M_{m=1}\sum_{k=1}^K n_{mk}\operatorname{ln} (\hat{p_{mk}})=2ND_{overall} \\ \text{Residual mean deviance}&=\frac{Deviance}{N-M} \end{aligned}$$

重要度(Importance)

使用某變數切割的節點的不純度，在切割後減少的量愈多則愈重要。

（即某節點的不純度*樣本數加權-分割後的不純度*樣本數加權)

$$\operatorname{Importance}(j)=\sum_{m\in{j}}[n_m\cdot I(R_m)-(n_{mLeft}\cdot I(R_{mLeft})+n_{mRight}\cdot I(R_{mRight}))]$$

混和決策樹

重要度(Importance)

定義與單一樹差不多，但每棵樹的該變數重要度取平均

Bagging

使用bootstrap進行取出放回抽樣B次(每次size一樣是N)，然後以下式估計

$$\hat{f_{bag}}(x)=\frac{1}{B}\sum^B_{b=1} \hat{f_{bag}^{*b}}(x)$$

Random Forest

跟Bagging差不多，但是每次分割的時候只從p個解釋變數裡面選m個當候選變數，m的選擇建議為

$$m\approx\sqrt{p}$$

Boosting

先假設估計值都是0，然後殘差是y，給定學習率λ，重複訓練B次分割d次的小樹計算殘差，並以λ更新估計值與殘差。最後以下式估計

$$\hat{f_{boosting}}(x)=\lambda\sum^B_{b=1} \hat{f_{boosting}^{*b}(x)}$$

若d=1，因為每次都只切一刀，解釋變數間沒有交互作用，所以每個樹可以直接根據解釋變數加起來

$$\hat{f_{boosting}}(x)=\sum^p_{j=1}f_{j}(x)$$

[srm] 決策樹

Table of contents

單一決策樹

節點不純度(Impurity)

整體不純度

重要度(Importance)

混和決策樹

重要度(Importance)

Bagging

Random Forest

Boosting

Subscribe to my newsletter

etori

etori