Statistical Thinking and Data Fluency.
How you gather, manage, and use information will determine whether you win or lose.Bill Gates
In this course you learn and practice critical, statistical thinking based on complex rectangular panel data. You will become data fluent and learn major skills that you can use in your academic and business career.
Specifically, you will be trained in the core competencies:
- Data Manipulation
- Data Visualization
- Data Modeling
- Data Reporting
The skills are introduced step by step. Your learning process is supported by DataCamp, an intuitive learning platform for data science and analytics. Your skills will be tested in weekly assignments.
In our seminar, we deal with description, prediction and causal inference. We start our journey thinking about relationships and associations. In statistical terms we calculate a correlation coefficient.
"Films Nicolas Cage appeared in" and the "Number of people who drowned by falling into a pool" are correlated with r = 0.66.http://www.tylervigen.com/spurious-correlations
Does Nicolas Cage Movies cause suicide? What makes a relationship causal? What is the relation between correlation, regression and causal inference?
The fundamental problem for causal inference is that we can observe only one outcome for an individual at a time. This is the actual outcome, e.g. participating in a training programme or taking medicine (the so called treatment). For one specific person, we cannot know what would have happened if that person was not participating in the programme.
We can come closer to a causal answer with special data or a certain methodology. We may run an experiment and compare two groups, one with and one without training.
From Correlation to Regression.
We are gonna talk about what has been, what is and what always will be the most important statistical technique ever: Regression.Prof. Matt Masten (Duke University)
The methodological roadmap is as follows:
- Group Comparison (t-Test and Chi-squared test)
- Simple Linear Regression
- Dummy explanatory variable
- Continuous explanatory variable
- Categorical explanatory variable
- Multiple Linear Regression
- Parallel Slopes (Dummy + Continuous)
- Interaction Effects
- Continuous * Categorical
- Continuous * Continuous
- Marginal Effects
- Logistic Regression
- Fixed Effects Regression
- First difference
- Time demeaning
- Least Square Dummy Variable
These methods are our tools to answer intriguing questions.