Chapter 2 of 4

What's Going On Behind the Scene

Fixed Effects Regression

Fixed effects controls for the individual. But what does that actually mean for the data?

It means we get rid of any variation between individuals. All of it. What's left is variation within individual — how each unit changes over time relative to its own average.

Within variation

How a variable changes within the same individual across different time periods.

Between variation

How a variable differs between individuals, comparing their averages.

Let's see this with a (hypothetical) football example.

Suppose we're studying whether training load affects goals scored. We track two players across two seasons.

Click the buttons above to step through

Player	Season	Training (hrs/wk)	Goals
Salah	22/23	11	19
Salah	23/24	13	23
Haaland	22/23	9	27
Haaland	23/24	7	22

Haaland trains less than Salah (8 hrs/wk vs 12) but scores more (24.5 vs 21 on average). The between relationship is negative — but that's just natural ability confounding things.

Now let's see this graphically. Step through to watch fixed effects in action.

Click the buttons above to step through

Salah

Haaland

Naive OLS

Four data points: two players, two seasons each. The dashed line is the naive OLS — it slopes down, suggesting more training means fewer goals.

The between story and the within story can be completely different.

Between: Haaland trains less but scores more. If we just compared players, we might wrongly conclude training hurts goal-scoring. But that's confounded by natural ability — (assumingly) an unobserved, time-invariant trait.

Within: When either player trains more than their own average, they score more than their own average. The within variation removes the confounding.

One important note: While we've removed the between variation, we haven't control for variables that's not fixed over time. For example, team's performance may influence the individual performance and they are likely to vary over time. This model can only control team performance if they do not vary over time.

Now let's see fixed effects in action with more data. Four clubs, six seasons each. Training sessions/week vs points per game.

Arsenal

Leicester

Brighton

Newcastle

Raw correlation: -0.203

Now let's find each club's average training and points. The crosses mark each club's centre.

Click the button above to toggle

Now subtract each club's mean — slide every club's centre to the origin. This is the within variation.

Click the button above to toggle

Raw correlation: -0.203

With the between variation removed, the true positive relationship between training and performance emerges.

Raw

-0.203

Between

-0.719

Within

0.984

The raw correlation was obscured — a mix of the negative between-club differences and the positive within-club relationship. The between correlation is negative because smaller clubs have more time to train more due to having less fixtures.

But the within correlation is strongly positive. When the same club trains more than its own average in a given season, it earns more points. That's the relationship we were after.

Mathematically, fixed effects is just subtracting individual means — a process called demeaning.

The original model with individual fixed effects:

y_{it} = \beta_0 + \beta_1 x_{it} + \alpha_i + \varepsilon_{it}

After demeaning (subtracting individual means):

(y_{it} - \bar{y}_i) = \beta_1 (x_{it} - \bar{x}_i) + (\varepsilon_{it} - \bar{\varepsilon}_i)

The individual effect $\alpha_i$ drops out completely — it's constant within individual, so it cancels when you subtract the mean.

What's left is purely within variation: how does each club's performance deviate from its own average when its training deviates from its own average?

The key takeaway:

Fixed effects works by comparing each unit to itself over time. By removing between-unit differences, it controls for everything that doesn't change — observed or not.

In practice, you don't need to demean by hand. Adding dummy variables for each individual (or using software's built-in fixed effects estimator) achieves the same thing.

But understanding the demeaning is important — it tells you exactly what variation you're using (within) and what you've discarded (between). If the between variation is what you care about, fixed effects won't help.