Deriving OLS

EC 320 Handout

Author

Andrew Dickinson

Getting started

Our simple linear regression model is

\[Y_i = \beta_0 + \beta_1 X_i + u_i.\]

We can decompose \(Y_i\) into estimated components:

\[Y_i = \hat{\beta}_0 + \hat{\beta}_1 X_i + \hat{u}_i.\]

Thus, for any regression coefficients \(\hat{\beta}_0\) and \(\hat{\beta}_1\) we obtain estimates of \(u_i\) called residuals (\(\hat{u}_i\)). Residuals are the “errors” of the estimated regression line:

\[\hat{u}_i = Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i.\]

By squaring \(\hat{u}_i\) and summing across \(i\), we get obtain the sum of squared residuals, also known as the residual sum of squares (RSS):

\[\sum_{i=1}^n \hat{u}_i^2 = \sum_{i=1}^n (Y_i - \hat{\beta}_0 - \hat{\beta}_1X_i)^2 = \text{RSS}.\]

Objective function

Our objective is to pick the \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the residual sum of squares:

\[\underset{\hat{\beta}_0,\, \hat{\beta}_1}{\text{min}} \quad \text{RSS} = \sum_{i=1}^n (Y_i - \hat{\beta}_0 - \hat{\beta}_1X_i)^2.\]

We will start by expanding the objective function and collecting like terms:

\[\begin{align} \text{RSS} &= \sum_{i=1}^n (Y_i - \hat{\beta}_0 - \hat{\beta}_1X_i)^2 \\ &= \sum_{i=1}^n (Y_i - \hat{\beta}_0 - \hat{\beta}_1X_i)(Y_i - \hat{\beta}_0 - \hat{\beta}_1X_i) \\ & = \sum_{i=1}^n (Y_i^2 - Y_i\hat{\beta}_0 - Y_i\hat{\beta}_1X_i - \hat{\beta}_0Y_i + \hat{\beta}_0^2 + \hat{\beta}_0\hat{\beta}_1X_i - \hat{\beta}_1X_iY_i + \hat{\beta}_1X_i\hat{\beta}_0 + \hat{\beta}_1^2X_i^2) \\ &= \sum_{i=1}^n (Y_i^2 - 2Y_i\hat{\beta}_0 - 2Y_i\hat{\beta}_1X_i + \hat{\beta}_0^2 + 2\hat{\beta}_0\hat{\beta}_1X_i + \hat{\beta}_1^2X_i^2). \end{align}\]

First-order conditions

Now we will find the \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the residual sum of squares. First we need to take partial derivatives with respect to \(\hat{\beta}_0\) and \(\hat{\beta}_1\) and then set these derivatives equal to zero. From this we obtain our first-order conditions:

\[\begin{align} \frac{\partial \text{RSS}}{\partial \hat{\beta}_0} &= \sum_{i=1}^n (-2Y_i + 2\hat{\beta}_0 + 2\hat{\beta}_1X_i) = 0 \label{foc1} \\ \frac{\partial \text{RSS}}{\partial \hat{\beta}_1} &= \sum_{i=1}^n (-2Y_i X_i + 2\hat{\beta}_0X_i + 2\hat{\beta}_1X_i^2) = 0 \label{foc2} \end{align}\]

The first-order conditions give us everything we need to find expressions for \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

Solve for the intercept formula

We will start by solving for \(\hat{\beta}_0\). From , we can isolate the term involving \(\hat{\beta}_0\) on the left hand by subtracting the other two terms from both sides:

\[\begin{align} 2 \sum_{i=1}^n \hat{\beta}_0 = 2 \sum_{i=1}^n Y_i - 2 \hat{\beta}_1 \sum_{i=1}^n X_i. \end{align}\]

Divide by 2 to obtain

\[\begin{align} \sum_{i=1}^n \hat{\beta}_0 = \sum_{i=1}^n Y_i - \hat{\beta}_1 \sum_{i=1}^n X_i. \label{midb1} \end{align}\]

Notice that \(\sum_{i=1}^n Y_i = n \frac{1}{n} \sum_{i=1}^n Y_i = n \bar{Y}\). Then becomes

\[\begin{align} n \hat{\beta}_0 = n \bar{Y} - \hat{\beta}_1 n \bar{X}. \end{align}\]

Dividing by \(n\), we obtain a simple formula for the intercept:

\[\begin{align} \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}. \end{align}\]

Solve for the slope formula

Next, we will solve for \(\hat{\beta}_1\). From , we have

\[\begin{align} - \sum_{i=1}^n Y_i X_i + \hat{\beta}_0 \sum_{i=1}^n X_i + \hat{\beta}_1 \sum_{i=1}^n X_i^2 = 0. \end{align}\]

Now plug in the expression for \(\hat{\beta}_0\):

\[\begin{align} - \sum_{i=1}^n Y_i X_i + (\bar{Y} - \hat{\beta}_1 \bar{X}) \sum_{i=1}^n X_i + \hat{\beta}_1 \sum_{i=1}^n X_i^2 &= 0 \\ - \sum_{i=1}^n Y_i X_i + \bar{Y} \sum_{i=1}^n X_i - \hat{\beta}_1 \bar{X} \sum_{i=1}^n X_i + \hat{\beta}_1 \sum_{i=1}^n X_i^2 &= 0. \end{align}\]

Isolate the \(\hat{\beta}_1\) terms on the left-hand side:

\[\begin{align} \hat{\beta}_1 (\sum_{i=1}^n X_i^2 - \bar{X} \sum_{i=1}^n X_i) &= \sum_{i=1}^n Y_i X_i - \bar{Y} \sum_{i=1}^n X_i. \end{align}\]

Dividing both sides by \((\sum_{i=1}^n X_i^2 - \bar{X} \sum_{i=1}^n X_i)\), we obtain a formula for the slope coefficient:

\[\begin{align} \hat{\beta}_1 &= \dfrac{\sum_{i=1}^n Y_i X_i - \bar{Y} \sum_{i=1}^n X_i}{\sum_{i=1}^n X_i^2 - \bar{X} \sum_{i=1}^n X_i}. \end{align}\]

Rearranging the slope formula

We now have an expression for \(\hat{\beta}_1\) in terms of data on \(X\) and \(Y\), but we can rearrange terms to get a more familiar expression. To do this, we are going to subtract zero from both the numerator and the denominator of \(\hat{\beta}_1\). Notice that

\[\begin{align} \sum_{i=1}^n (X_i - \bar{X}) &= \sum_{i=1}^n X_i - \sum_{i=1}^n \bar{X} \\ &= \sum_{i=1}^n X_i - n\bar{X} \\ &= \sum_{i=1}^n X_i - n \frac{1}{n} \sum_{i=1}^n X_i \\ &= \sum_{i=1}^n X_i - \sum_{i=1}^n X_i \\ &= 0. \end{align}\]

Using this trick, we will now subtract zero. Our choice of zero is strategic. If you have trouble following this step, you can try working backwards from the end.

\[\begin{align} \hat{\beta}_1 &= \dfrac{\sum_{i=1}^n Y_i X_i - \bar{Y} \sum_{i=1}^n X_i - \color{pine_green}{\bar{X} \sum_{i=1}^n (Y_i - \bar{Y})}} {\sum_{i=1}^n X_i^2 - \bar{X} \sum_{i=1}^n X_i - \color{pine_green}{\bar{X} \sum_{i=1}^n (X_i - \bar{X}) }}. \end{align}\]

Distribute terms and pull constants into the sums to obtain

\[\begin{align} \hat{\beta}_1 &= \dfrac{\sum_{i=1}^n Y_i X_i - \sum_{i=1}^n \bar{Y} X_i - \sum_{i=1}^n \bar{X} Y_i + \sum_{i=1}^n \bar{Y} \bar{X}} {\sum_{i=1}^n X_i^2 - \sum_{i=1}^n \bar{X} X_i - \sum_{i=1}^n \bar{X} X_i + \sum_{i=1}^n \bar{X}^2 }. \end{align}\]

By factoring we obtain a beautiful expression for \(\hat{\beta}_1\):

\[\begin{align} \hat{\beta}_1 &= \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})} {\sum_{i=1}^n (X_i - \bar{X}) (X_i - \bar{X})}. \end{align}\]

OLS formulas

We have derived OLS!

\[\begin{align} \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} \end{align}\]

\[\begin{align} \hat{\beta}_1 &= \dfrac{\sum_{i=1}^n (Y_i - \bar{Y})(X_i - \bar{X})} {\sum_{i=1}^n (X_i - \bar{X}) (X_i - \bar{X})} \end{align}\]

Second-order conditions

We still need to check for convexity to make sure we minimized the objective function. We would feel silly if we maximized it. To make sure we minimized \(\text{RSS}\), we will take second derivatives and check to see if they are positive.

\[\begin{align} \frac{\partial^2 \text{RSS}}{\partial \hat{\beta}_0^2} &= \sum_{i=1}^n 2 > 0 \label{soc1} \\ \frac{\partial^2 \text{RSS}}{\partial \hat{\beta}_1^2} &= \sum_{i=1}^n 2X_i^2 > 0 \label{soc2} \end{align}\]

Crisis averted. Now we can say that we’ve minimized the residual sum of squares.