Linear models: Misspecification
In our discussion of linear model inference in Unit 2, we assumed the normal linear model throughout:
\[ \boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \quad \text{where} \ \boldsymbol{\epsilon} \sim N(\boldsymbol{0}, \sigma^2 \boldsymbol{I}_n). \]
In this unit, we will discuss what happens when this model is misspecified:
- Non-normality (Non-normality): \(\boldsymbol{\epsilon} \sim (0, \sigma^2 \boldsymbol{I}_n)\) but not \(N(0, \sigma^2 \boldsymbol{I}_n)\).
- Heteroskedastic and/or correlated errors (Heteroskedastic and correlated errors): \(\boldsymbol{\epsilon} \sim (0, \boldsymbol{\Sigma})\), where \(\boldsymbol{\Sigma} \neq \sigma^2 \boldsymbol{I}\). This includes the case of heteroskedastic errors (\(\boldsymbol{\Sigma}\) is diagonal but not a constant multiple of the identity) and correlated errors (\(\boldsymbol{\Sigma}\) is not diagonal).
- Model bias (Model bias): It is not the case that \(\mathbb{E}[\boldsymbol{y}] = \boldsymbol{X} \boldsymbol{\beta}\) for some \(\boldsymbol{\beta} \in \mathbb{R}^p\).
- Outliers (Outliers): For one or more \(i\), it is not the case that \(y_i \sim N(\boldsymbol{x}_{i*}^T \boldsymbol{\beta}, \sigma^2)\).
For each type of misspecification, we will discuss its origins, consequences, detection, and fixes (Non-normality-Outliers). We then discuss methodological approaches to address model misspecification, including asymptotic robust inference methods (14 Asymptotic methods), the bootstrap (15 The bootstrap), the permutation test (16 The permutation test), and robust estimation (17 Robust estimation and inference). We conclude with an R demo (18 R demo).