Chapter 1 of 7

Adding a Second Variable

Multiple Linear Regression

In simple regression, we used one variable — salary — to predict market value. R² = 70.8%. Can we do better?

4681012Annual Salary (M€)01020304050Market Value (M€)OLS
Estimated regression equation
y^=0.9+3.90×salary\hat{y} = -0.9 + 3.90 \times \text{salary}

But these players have different positions. It may help us improve the model.

4681012Annual Salary (M€)01020304050Market Value (M€)GoalkeeperOutfieldOLS

The single line underpredicts outfield players and overpredicts goalkeepers. It seems that position matters.


Add position as a second variable. Each group gets its own intercept — but the same slope.

4681012Annual Salary (M€)01020304050Market Value (M€)GoalkeeperOutfield
Estimated regression equation
y^=1.4+3.04×salary+10.8×outfield\hat{y} = -1.4 + 3.04 \times \text{salary} + 10.8 \times \text{outfield}
0.0%

The binary variable is a switch. It shifts the line up for outfield players by a fixed amount.

4681012Annual Salary (M€)01020304050Market Value (M€)GoalkeeperOutfieldb₂ = 10.8
Full estimated equation
y^=1.4+3.04×salary+10.8×outfield\hat{y} = -1.4 + 3.04 \times \text{salary} + 10.8 \times \text{outfield}
Goalkeeper (outfield = 0)
y^=1.4+3.04×salary+10.8×0\hat{y} = -1.4 + 3.04 \times \text{salary} + 10.8 \times \textcolor{#b45309}{0}
y^=1.4+3.04×salary\hat{y} = -1.4 + 3.04 \times \text{salary}
Outfield (outfield = 1)
y^=1.4+3.04×salary+10.8×1\hat{y} = -1.4 + 3.04 \times \text{salary} + 10.8 \times \textcolor{#2563eb}{1}
y^=1.4+3.04×salary+10.8\hat{y} = -1.4 + 3.04 \times \text{salary} + 10.8
y^=9.4+3.04×salary\hat{y} = 9.4 + 3.04 \times \text{salary}

Both lines have the same slope (3.04). The only difference is the starting point — the vertical shift of 10.8.

Built with SvelteKit + D3.js