Chapter 2 of 6

The Regression

Difference-in-Differences

The classic approach: Two-Way Fixed Effects (TWFE).

The idea is straightforward. We want to control for group differences and time differences. So we just... do exactly that. Add fixed effects for both.

Y = \alpha_i + \alpha_t + \beta_1 \cdot \text{Treated} + \varepsilon

$\alpha_i$ — a set of unit (group) fixed effects. In our case, one for each league (or eventually, each club). These absorb any time-invariant differences between groups.

$\alpha_t$ — a set of time fixed effects. One for each season. These absorb any common shocks that hit both leagues in the same season.

$\text{Treated}$ — a binary variable that equals 1 when a unit is actually being treated right now — i.e., it's in the treated group and in the post-treatment period.

The coefficient $\beta_1$ is your difference-in-differences estimate.

This is called "two-way" fixed effects because it has two sets of fixed effects: one for unit (group) and one for time period. It gives us the exact same result as manually computing (treated after − treated before) − (control after − control before).

💡

Need a refresher on fixed effects? Chapter 4 of the Fixed Effects module covers two-way fixed effects in more detail.

With a simple pre/post design, there's an equivalent way to write this.

This form assumes the simplest DiD setup: 2 periods (before and after) and 2 groups (treated and control), where every unit in the treatment group is treated at the same time. No staggered adoption.

Y = \beta_0 + \beta_1 \cdot \text{TreatedGroup} + \beta_2 \cdot \text{Post} + \beta_3 \cdot (\text{TreatedGroup} \times \text{Post}) + \varepsilon

$\text{TreatedGroup}$ — equals 1 if you're in the treated group (Premier League), regardless of time period.

$\text{Post}$ — equals 1 if you're in the post-treatment period (23/24 or 24/25), regardless of which group you're in.

$\text{TreatedGroup} \times \text{Post}$ — the interaction term. Equals 1 only when you're in the treated group AND in the post period. This is the same as $\text{Treated}$ in the TWFE equation.

By standard interaction-term interpretation, $\beta_3$ tells us how much bigger the treated-group effect is in the post-period than in the pre-period. That's difference-in-differences.

Let's see the regression output for our timekeeping rule example.

Using the interaction-term model on our Premier League vs Bundesliga data:

DiD Estimate: Effect of Timekeeping Rule on Goals per Club per Game

Variable	Coefficient	Interpretation
$\beta_0$ (Intercept)	1.57	Control group, before
$\beta_1$ (TreatedGroup)	-0.15	EPL vs Bundesliga baseline gap
$\beta_2$ (Post)	+0.02	Time trend (control change)
$\beta_3$ (TreatedGroup × Post)	+0.12	DiD estimate

N = 8 (2 leagues × 4 seasons, league-level aggregates). Group FE: League. Time FE: Season.

This matches our manual calculation from Chapter 1 exactly. The regression is just a formal way to do the same arithmetic.

What does the DiD coefficient actually mean?

$\hat{\beta}_3 = +$ 0.12 means that the timekeeping rule was associated with 0.12 more goals per club per game in the Premier League, beyond what we would have expected from the general time trend.

More precisely, $\beta_3$ tells us how much more the treated group changed relative to the control group. It's the gap between:

What we observed in the treated group after treatment
What we expected (based on the control group's change)

This is the Average Treatment on the Treated (ATT)

DiD estimates the effect of treatment for the group that actually got treated. If the Bundesliga would have responded differently to timekeeping rules, we have no way of knowing from DiD alone.

Key points when interpreting:

The estimate depends on the parallel trends assumption holding. If it doesn't, our "expected" counterfactual is wrong.
With multiple time periods, the full TWFE model (with season fixed effects) is more flexible than the simple post-dummy interaction model. It can also be modified to observe dynamic treatment effects. However, in the case of staggered treatment timing (treatment start at different times), other method is needed.
Standard errors should typically be clustered at the group level to account for serial correlation within groups.