Skip to content


We are gonna talk about what has been, what is and what always will be the most important statistical technique ever: Regression.

Prof. Matt Masten (Duke University)

The aims of empirical analysis

  • Description
  • Prediction
  • Causal Inference

In our seminar, we deal with description and causal inference. Prediction is related to Machine Learning. There are excellent courses at Viadrina, on the Web and on Datacamp that cover Machine Learning.

Causal inference is a core task of science.

HernĂ¡n, M. A. (2018). The C-word: scientific euphemisms do not improve causal inference from observational data. American journal of public health, 108(5), 616-619.

Correlation does not imply causation.

We may start our journey thinking about relationships and associations. In statistical terms we calculate a correlation coefficient.

Do storks deliver babies?

The correlation between the stork population and birth rate is 0.62, p = 0.0008 (17 European countries, data from 1990).

Matthews, R. (2000). Storks deliver babies (p= 0.008). Teaching Statistics, 22(2), 36-38.

Does Nicolas Cage Movies cause suicide?

"Films Nicolas Cage appeared in" and the "Number of people who drowned by falling into a pool" are correlated with r = 0.66.

What makes a relationship causal? What is the relation between correlation, regression and causal inference?

Fundamental problem of causal inference

The fundamental problem for causal inference is that we can observe only one outcome for an individual at a time. This is the actual outcome, e.g. participating in a training programme or taking medicine (the so called treatment). For one specific person, we cannot know what would have happened if that person was not participating in the programme.

Sometimes we can come close to a causal answer due to the data we have or a certain methodology we apply. We may run an experiment and compare two groups, one with and one without training.

Regression Techniques

From a mathematical point of view (simple) regression looks like this: y_i=\beta_0 +\beta_1 x_i +\varepsilon_i,\quad i=1,\dots,n.

Amongst others we cover:

  • Covariance
  • Correlation
  • Simple Linear Regression
    • Dummy variable
    • Continuous variable
    • Categorical variable
  • Multiple Linear Regression
    • Parallel Slopes (Dummy + Continuous)
    • Interaction Effects
      • Continuous * Categorical
      • Continuous * Continuous
    • Marginal Effects
  • Logistic Regression
    • Probability, Odds, Log Odds
    • Chi Square Test
  • Fixed Effects Regression
    • First difference
    • Time demeaning
    • Least Square Dummy Variable