[srm] 線性迴歸

etorietori
3 min read

符號定義

$$\begin{aligned} {S_{xx}} &:=\sum(x_i-\bar{x})^2 \\ {S_{xy}} &:=\sum(x_i-\bar{x})(y_i-\bar{y}) \\ \operatorname{SST} &=\operatorname{TSS}=\sum\limits_{i=1}^n (y_i-\bar{y})^2 \\ \operatorname{SSR} &=\operatorname{RegSS}=\sum\limits_{i=1}^n (\hat{y_i}-\bar{y})^2 \\ \operatorname{SSE} &=\operatorname{RSS}=\sum\limits_{i=1}^n ({y_i}-\hat{y})^2 \\ p&:=\#\:paramaters\:except\:of\:intercept \end{aligned}$$

Simple Linear Regression with Intercept

基本定義

模型假設

$$y_i=\beta_0+\beta_1x+\varepsilon_i \qquad \varepsilon_i\overset{iid}{\sim} N(\mu,\sigma^2)$$

參數估計

$$\begin{aligned} \hat{\beta_1} &=\frac{\sum\limits_{i=1}^n (x_i-\bar{x})-(y_i-\bar{y})} {\sum\limits_{n=1}^n (x_i-\bar{x})^2}=\frac{\sum\limits_{i=1}^n (x_iy_i-n\bar{x}\bar{y})} {\sum\limits_{n=1}^n (x_i^2-n\bar{x}^2)}= r_{xy} \frac{s_y}{s_x} \\ \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \end{aligned}$$

統計性質

$$\begin{aligned} &\operatorname{E}[\hat{\beta}_0] = \beta_0, \quad \operatorname{Var}(\hat{\beta}_0) = \sigma^2 \left( \frac{1}{n} + \frac{\bar{x}^2}{S_{xx}} \right) \\ &\operatorname{E}[\hat{\beta}_1] = \beta_1, \quad \operatorname{Var}(\hat{\beta}_1) = \frac{\sigma^2}{S_{xx}} \\ &\operatorname{Cov}(\hat{\beta}_0, \hat{\beta}_1) = -\frac{\sigma^2}{S_{xx}} \bar{x} \\ &\operatorname{E}[\hat{y}_i] = \beta_0 + \beta_1 x_i, \quad \operatorname{Var}(\hat{y}_i) = \sigma^2 \left( \frac{1}{n} + \frac{(x_i - \bar{x})^2}{S_{xx}} \right) \\ &\operatorname{Var}(y^* - \hat{y}^*) = \sigma^2 \left( 1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{S_{xx}} \right) \\ &\text{(會多 1 份變異數是因為新的點 y* 本身也帶有模型誤差)} \\ &h_{ii} = \frac{1}{n} + \frac{(x_i - \bar{x})^2}{S_{xx}} \end{aligned}$$

檢定統計量

$$\begin{aligned} &R^2=\frac{SSR}{SST}=1-\frac{SSE}{SST}=r^2 \\ &SSR=\hat{\beta^2_1}S_{xx} \\ \\ &F=\frac{SSR/1}{SSE/n-2}=\frac{R^2}{1-R^2}\frac{n-1}{1}=t(\hat{\beta_1})^2 \end{aligned}$$

Simple Linear Regression without Intercept

基本定義

模型假設

$$y_i=\beta_1x+\varepsilon_i \qquad \varepsilon_i\overset{iid}{\sim} N(\mu,\sigma^2)$$

參數估計

$$\hat{\beta_1}=\frac{\sum\limits_{i=1}^n (x_iy_i)} {\sum\limits_{n=1}^n x_i^2}$$

Multiple Linear Regression

基本定義

模型假設

$$\begin{aligned} &y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_px_{ip}+\varepsilon_i \qquad \varepsilon_i\overset{iid}{\sim} N(\mu,\sigma^2) \end{aligned}$$

$$\begin{aligned} \mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon} \\ \\ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} &= \begin{bmatrix} 1 & x_{11} & \cdots & a_{1p} \\ 1 & x_{21} & \cdots & a_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & \cdots & a_{np} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{bmatrix} \end{aligned}$$

參數估計

$$\hat{\boldsymbol{\beta}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$$

$$\begin{aligned} \mathbf{X}'\mathbf{X} &= \begin{bmatrix} n & \sum{x_{i1}} & \sum{x_{i2}} & \cdots & \sum{x_{ip}} \\ \sum{x_{i1}} & \sum{x^2_{i1}} & \sum{x_{i1} x_{i2}}& \cdots & \sum{x_{i1} x_{ip}} \\ \sum{x_{i2}} & \sum{x_{i1} x_{i2}} & \sum{x^2_{i2}}& \cdots & \sum{x_{i2} x_{ip}} \\ \vdots & \vdots &\vdots & \ddots & \vdots \\ \sum{x_{ip}} & \sum{x_{i1} x_{ip}} & \sum{x_{i2} x_{ip}} & \cdots & \sum{x^2_{ip}} \end{bmatrix} \\ \mathbf{X}'\mathbf{y} &= \begin{bmatrix} \sum{y_i} \\ \sum{x_{i1}y_i} \\ \sum{x_{i2}y_i} \\ \vdots \\ \sum{x_{ip}y_i} \end{bmatrix} \end{aligned}$$

統計性質

$$\begin{aligned} \operatorname{E}[\hat{\boldsymbol{\beta}}] &= \boldsymbol{\beta} \\ \operatorname{Var}(\hat{\boldsymbol{\beta}}) &= \sigma^2 (\mathbf{X}'\mathbf{X})^{-1} \quad \text{(變異數矩陣,非對角線即為共變異數)} \\\\ \operatorname{E}[\hat{y}_i] &= \mathbf{X} \boldsymbol{\beta} \\ \operatorname{Var}[\hat{y}_i] &= \sigma^2 \mathbf{x}' (\mathbf{X}'\mathbf{X})^{-1} \mathbf{x} \\\\ \operatorname{Var}[y_i - \hat{y}_i] &= \sigma^2 \left( 1 + \mathbf{x}' (\mathbf{X}'\mathbf{X})^{-1} \mathbf{x} \right) \\ &\quad \text{(會多 1 份變異數是原本模型假設的變異數)} \\\\ \mathbf{H} &\equiv \mathbf{X} (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}' \\ \hat{\mathbf{y}} &= \mathbf{X} \hat{\boldsymbol{\beta}} = \mathbf{X} (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}' \mathbf{y} = \mathbf{H} \mathbf{y} \\ \operatorname{Var}[\hat{\mathbf{y}}] &= \sigma^2 \mathbf{H} \end{aligned}$$

檢定統計量

$$\begin{aligned} &R^2=\frac{SSR}{SST}=1-\frac{SSE}{SST} \\ &R_{adj}^2=1-\frac{SSE/(n-p-1)}{SST/(n-1)} \\ \\ &F=\frac{\Delta SSE/(\Delta p)}{SSE_{full}/(n-p-1)}=\frac{\Delta R^2}{1-R^2}\frac{n-p-1}{\Delta p} \quad{\sim}F_{\Delta p,n-p-1} \\ &F_j=\frac{\Delta SSR/1}{SSE/n-p-1}=t(\hat{\beta_j})^2 \end{aligned}$$

模型診斷

殘差分析

$$\begin{aligned} &\hat{\mathbf{e}} \equiv \mathbf{y} - \hat{\mathbf{y}} = (\mathbf{I} - \mathbf{H}) \mathbf{y} \\\\ &\operatorname{Var}(e_i) = \sigma^2(1 - h_{ii}) \\ &e^{\text{st}}_i = \frac{e_i}{\sqrt{s^2(1 - h_{ii})}} \quad \textcolor{green}{\text{Outlier: } e^{\text{st}}_i \geq 2 \sim 3} \\ &e^{\text{stud}}_i = \frac{e_i}{\sqrt{s^2_{(i)}(1 - h_{ii})}} \quad \sim t_{n-p-1} \\ &s^2_{(i)} := \text{MSE without the i-th data point} \\\\ &\textcolor{red}{\operatorname{Cov}(\hat{y}, e) = 0} \\ &\textcolor{red}{\operatorname{Cov}(y, e) \ne 0} \\ &\text{(殘差與配適值相關,但與觀測值本身無關)} \end{aligned}$$

影響力分析

$$\begin{aligned} &\sum_{i=1}^n h_{ii} = p + 1 \quad \textcolor{green}{\text{High Leverage: } h_{ii} \geq 3\bar{h}} \\ &\frac{1}{n} \le h_{ii} \le 1 \\\\ &\text{Cook's Distance } D_i \equiv \frac{\sum\limits_{j=1}^{n} (\hat{y}_j - \hat{y}_{j(i)})^2}{(p+1)s^2} = \frac{1}{p+1} (e^{\text{st}}_i)^2 \cdot \frac{h_{ii}}{1 - h_{ii}} \quad \textcolor{green}{\text{Influential: } D_i \geq \frac{1}{\bar{n}}} \\ &\hat{y}_{j(i)} \equiv \hat{y}_j \text{ (fit without i-th data)} \end{aligned}$$

共線分析

$$\begin{aligned} \\ r(y,x_j\,|x_p \: j\notin j)&=Corr(e,e_{x_j}) \\ &=\frac{\sum\limits_{i=1}^ne_ie_{x_j}}{\sum\limits_{i=1}^ne_i^2\sum\limits_{i=1}^ne_{x_j}^2} \\ &=\frac{t(\hat{\beta_j})}{\sqrt{t(\hat{\beta_j})^2+(n-p-1)}} \\ &(衡量給定x_j以外的變數時,x_j與y的相關性) \end{aligned} e_{x_j}\equiv e_{x_j\sim x_{i\ne j}}$$

$$\begin{aligned} \operatorname{VIF}j&=\frac{1}{1-R^2{(j)}} \\ R^2_{(j)}&\equiv R^2_{x_j\sim x_{i\ne j}} \\\\ \operatorname{Var}[\hat{\beta}]&=\frac{\sigma}{S_{x_jx_j}}\operatorname{VIF}_j \end{aligned}$$

同質變異分析

Breusch-Pagan testing

$$\begin{aligned} H_0 &: \operatorname{Var}(\varepsilon_i) = \sigma^2 \\ H_a &: \operatorname{Var}(\varepsilon_i) = \sigma^2 + \mathbf{z'i} \boldsymbol{\gamma} \\\\ \chi^2 &= \frac{\operatorname{SSR}{e_i^2 \sim \mathbf{z}}}{2} = \frac{\operatorname{SSR}{e_i^2 \sim \mathbf{z}}}{2 (s^2)^2} \sim \chi^2{\dim(\boldsymbol{\gamma})} \end{aligned} \quad e_i^2 := \frac{e_i^2}{s^2}$$

模型比較(Model Assessment)

$$\begin{aligned} \operatorname{C_p}&=\frac{SSE+2ps_{full}^2}{n} \\ \operatorname{AIC_LSR}&=\frac{SSE+2ps_{full}^2}{n}\equiv\operatorname{C_p} \\ \operatorname{BIC_LSR}&=\frac{SSE+ln(n)s_{full}^2}{n} \\&(都是愈小愈好) \end{aligned}$$

模型選擇(Model Selection)

LOOCV(Leave One Out Cross-Validation)

$$\operatorname{CV_{(n)}} :=\frac{\sum\limits_{i=1}^nMSE_i}{n} :=\frac{\sum\limits_{i=1}^n(y_i-\hat{y_i})^2}{n} =\frac{1}{n}\sum_{i=1}^n(\frac{e_i}{1-h_{ii}})^2$$

1-SE rule

從最小的CV算正負一個SE的範圍內,選變數最少的模型

0
Subscribe to my newsletter

Read articles from etori directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

etori
etori

test bio