Linear models: Misspecification
In our discussion of linear model inference in Unit 2, we assumed the normal linear model throughout:
\[ \boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \quad \text{where} \ \boldsymbol{\epsilon} \sim N(\boldsymbol{0}, \sigma^2 \boldsymbol{I}_n). \]
In this unit, we will discuss what happens when this model is misspecified:
- Non-normality (Section 13.1): \(\boldsymbol{\epsilon} \sim (0, \sigma^2 \boldsymbol{I}_n)\) but not \(N(0, \sigma^2 \boldsymbol{I}_n)\).
- Heteroskedastic and/or correlated errors (Section 13.2): \(\boldsymbol{\epsilon} \sim (0, \boldsymbol{\Sigma})\), where \(\boldsymbol{\Sigma} \neq \sigma^2 \boldsymbol{I}\). This includes the case of heteroskedastic errors (\(\boldsymbol{\Sigma}\) is diagonal but not a constant multiple of the identity) and correlated errors (\(\boldsymbol{\Sigma}\) is not diagonal).
- Model bias (Section 13.3): It is not the case that \(\mathbb{E}[\boldsymbol{y}] = \boldsymbol{X} \boldsymbol{\beta}\) for some \(\boldsymbol{\beta} \in \mathbb{R}^p\).
- Outliers (Section 13.4): For one or more \(i\), it is not the case that \(y_i \sim N(\boldsymbol{x}_{i*}^T \boldsymbol{\beta}, \sigma^2)\).
For each type of misspecification, we will discuss its origins, consequences, detection, and fixes (Section 13.1-Section 13.4). We then discuss methodological approaches to address model misspecification, including asymptotic robust inference methods (Chapter 14), the bootstrap (Chapter 15), the permutation test (Chapter 16), and robust estimation (Chapter 17). We conclude with an R demo (Chapter 18).