Multiple linear regression

EC 320, Set 07

Andrew Dickinson

Spring 2024

Housekeeping

PS04:

Due date: Next Thursday (5/23) @ 11:59p

Koans:

Due dates were moved around
No koans due this week
K08, K09, K10, and K11 are due next week

Reading: (up to this point)

ItE: R, 1, 2, 3

MM: 1, 2

Prologue

First, a quick recap of what we’ve done thus far.

The regression model

We can estimate the effect of \(X\) on \(Y\) by estimating a regression model:

\[Y_i = \beta_0 + \beta_1 X_i + u_i\]

\(Y_i\) is the outcome variable.
\(X_i\) is the treatment variable (continuous).
\(\beta_0\) is the intercept parameter. \(\mathop{\mathbb{E}}\left[ {Y_i | X_i=0} \right] = \beta_0\)
\(\beta_1\) is the slope parameter, which under the correct causal setting represents marginal change in \(X_i\)’s effect on \(Y_i\). \(\frac{\partial Y_i}{\partial X_i} = \beta_1\)
\(u_i\) is an error term including all other (omitted) factors affecting \(Y_i\).

The error term

\(u_i\) is quite special

Consider the data generating process of variable \(Y_i\),

\(u_i\) captures all unobserved relationships that explain variation in \(Y_i\).

Some error will exist in all models, our aim is to minimize error under a set of constraints. This error is the price we are willing to accept for simplified model

The error term

Five items contribute to the existence of the disturbance term:

1. Omission of explanatory variables

2. Aggregation of Variables

3. Model misspecificiation

4. Functional misspecificiation

5. Measurement error

Running regressions

Using an estimator with data on \(X_i\) and \(Y_i\), we can estimate a fitted regression line:

\[ \hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1 X_i \]

\(\hat{Y_i}\) is the fitted value of \(Y_i\).
\(\hat{\beta}_0\) is the estimated intercept.
\(\hat{\beta}_1\) is the estimated slope.

This procedure produces misses, known as residuals, \(Y_i - \hat{Y_i}\)

Gauss-Markov Theorem

OLS is the Best Linear Unbiased Estimator (BLUE) when the following assumptions hold:

A1. Linearity: The population relationship is linear in parameters with an additive error term.

A2. Sample Variation: There is variation in \(X\).

A3. Exogeniety: The \(X\) variable is exogenous

A4. Homoskedasticity: The error term has the same variance for each value of the independent variable

A5. Non-autocorrelation: The values of error terms have independent distributions

Consider the following example.

Ex. Effect of class sizes on test scores

Empirical question:: What improvement do smaller class sizes have on student test scores, if any?

Ex. Effect of class sizes on test scores

Estimate effect of class size on test scores with the following:

\[ \text{Scores}_i = \beta_0 + \beta_1 \text{Class Size}_i + u_i \]

Data: Test performance and class across school districts in MA

Scores: 4th grade test scores agg. across reading, math, and science
Class size: Ratio of number of students to teachers

Always plot your data first

Raw data

Fitting OLS

Ex. Effect of class sizes on test scores

Estimate effect of class size on test scores with the following:

\[ \text{Scores}_i = \beta_0 + \beta_1 \text{Class Size}_i + u_i \]

Q. How might smaller class sizes influence test scores?

A. More personalized teaching, less classroom disruptions etc.

Q. What sign would we expect on \(\beta_1\)?

\[ \beta_1 < 0 \]

Smaller class sizes (X) increases test scores (Y)

Ex. Effect of class sizes on test scores

Estimate effect of class size on test scores with the following:

\[ \text{Scores}_i = \beta_0 + \beta_1 \text{Class Size}_i + u_i \]

Q. Do we think \(\beta_1\) will be a good guess of the underlying population parameter?

A. In \(u_i\), several variables are correlated with class size and test scores

Such as… school funding, which might affect:

Textbooks
Computers

Teacher salary
Attract high income parents

Smaller class sizes (X) increases test scores (Y)

Smaller class sizes (X) increases test scores (Y) along with greater school funding (U)

Smaller class sizes (X) increases test scores (Y) along with greater school funding (U). And, school funding (U) is correlated with test scores (X).

Any unobserved variable that connects a backdoor path between class size (X) and test scores (Y) will bias our point estimate of \(\beta_1\).

Any unobserved variable that connects a backdoor path between class size (X) and test scores (Y) will bias our point estimate of \(\beta_1\). Why?

A1. Linearity

A2. Sample Variation

A3. Exogeniety

A4. Homoskedasticity

A5. Non-autocorrelation

A6. Normality

Any unobserved variable that connects a backdoor path between class size (X) and test scores (Y) will bias our point estimate of \(\beta_1\). Why?

A1. Linearity

A2. Sample Variation

A3. Exogeniety: The \(X\) variable is exogenous

A4. Homoskedasticity

A5. Non-autocorrelation

A6. Normality

Any unobserved variable that connects a backdoor path between class size (X) and test scores (Y) will bias our point estimate of \(\beta_1\). Why?

A. Because is violates the exogeniety assumption

\[ \mathop{\mathbb{E}}\left( u|\text{Class Size} \right) \neq 0 \]

Correlation between class size and school funding (\(u_i\)) is not zero.

Graphically…

Valid exogeniety, i.e., \(\mathop{\mathbb{E}}\left( u \mid X \right) = 0\)

Note: This is simulated data

Invalid exogeniety, i.e., \(\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0\)

Note: This is simulated data

What the actual data look like:

What the actual data look like, as a scatter plot:

This violation has a name. We call it omitted variable bias

Omitted Variable Bias

Omitted variable bias

Bias that occurs in statistical models when a relevant variable is not included in the model.

Consequence: Leads to the incorrect estimation of the relationships between variables, which may affect the reliability of the model’s predictions and inferences.

Solution: “Control” for the omitted variable(s).

Class funding (U) confounds our estimates of smaller class sizes (X) on test scores (Y).

Any unobserved variable that connects a backdoor path between class size (X) and test scores (Y) will bias our point estimate of \(\beta_1\).

Class funding (U) confounds our estimates of smaller class sizes (X) on test scores (Y). Including data on school funding (U) in a multiple linear regression allows us to close this backdoor path.

With all backdoor paths closed, point estimates of \(\beta_1\) will no longer be biased and will return the population parameter of interest

In a little more detail, we can derive the bias mathematically.

Omitted Variable Bias

Imagine we have a population model of the form:

\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + u_i \]

where \(Z_i\) is a relevant variable that is omitted from the model.

and suppose we estimate the following model:

\[ Y_i = \hat{\beta}_0 + \hat{\beta}_1 X_i + v_i \]

where \(v_i\) is the new error term that absorbs the effect of \(Z_i\)

Omitted Variable Bias

To derive the bias of \(\hat{\beta}_1\), we need to understand the relationship between \(Z_i\) and \(X_i\). Assume that:

\[ Z_i = \gamma_0 + \gamma_1 X_i + \varepsilon_i \]

where \(\varepsilon_i\) is the part of \(Z_i\) that is uncorrelated with \(X_i\)

If we substitute \(Z_i\) into the population model, we get:

\[ \begin{align*} Y_i &= \beta_0 + \beta_1 X_i + \beta_2 \left( \gamma_0 + \gamma_1 X_i + \varepsilon_i \right) + u_i \\ &= \beta_0 + \beta_2 \gamma_0 + \left( \beta_1 + \beta_2 \gamma_1 \right) X_i + \beta_2 \varepsilon_i + u_i \end{align*} \]

Omitted Variable Bias

We can rewrite this expression:

as:

\[ Y_i = \widehat{\beta}_0 + \widehat{\beta}_1 X_i + v_i \]

where:

\(\widehat{\beta}_0 = \beta_0 + \beta_2 \gamma_0\)
\(\widehat{\beta}_1 = \beta_1 + \beta_2 \gamma_1\)
\(v_i = \beta_2 \varepsilon_i + u_i\)

Thus, we can see how \(Z_i\) will bias our estimate of \(\beta_1\)

Omitted Variable Bias

Recall that we define the bias of an estimator as:

\[ \mathop{\text{Bias}}_\theta \left( W \right) = \mathop{\boldsymbol{E}}\left[ W \right] - \theta \]

The bias of the estimator \(\hat{\beta}_1\) is given by:

\[ \begin{align*} \mathop{\text{Bias}}_{\beta_1} \left( \hat{\beta}_1 \right) &= \mathop{\boldsymbol{E}}\left[ \hat{\beta}_1 \right] - \beta_1 \\ &= \mathop{\boldsymbol{E}}\left[ \beta_1 + \beta_2 \gamma_1 \right] - \beta_1 \\ &= \beta_2 \gamma_1 \end{align*} \]

Omitted Variable Bias

Finally, we can write the bias of \(\hat{\beta}_1\) in terms of the correlation between \(X_i\) and \(Z_i\):

\[ \gamma_1 = \frac{\text{Cov}\left( X_i, Z_i \right)}{\text{Var}\left( X_i \right)} \]

Therefore, we can write the bias of \(\hat{\beta}_1\) as:

\[ \mathop{\text{Bias}}_{\beta_1} \left( \hat{\beta}_1 \right) = \beta_2 \frac{\text{Cov}\left( X_i, Z_i \right)}{\text{Var}\left( X_i \right)} \]