### Data Fluency.

How you gather, manage, and use information will determine whether you win or lose.

Bill Gates

The **first half** of the course introduces all the basics from scratch. You go from zero you hero in data analysis and data science and will become **data fluent** and learn major skills that you can use in your **academic and business career.** Specifically, you will be trained in the core competencies:

- Data Manipulation
- Data Visualization
- Data Modeling
- Data Reporting

The skills are introduced step by step. Your learning process is supported by DataCamp, an intuitive learning platform for data science and analytics. Your skills will be tested in **weekly assignments**.

### Statistical Thinking.

We start our journey thinking about **relationships and associations**. In statistical terms we calculate a correlation coefficient.

"Films Nicolas Cage appeared in" and the "Number of people who drowned by falling into a pool" are correlated with r = 0.66.

http://www.tylervigen.com/spurious-correlations

**Do Nicolas Cage Movies cause suicide?** What do you think?

The **fundamental problem **for causal inference is that we can observe only one outcome for an individual at a time. This is the actual outcome, e.g. participating in a training programme or taking medicine (the so called treatment). For one specific person, we cannot know what would have happened if that person was not participating in the programme.

### From Correlation to Regression.

We are gonna talk about what has been, what is and what always will be the most important statistical technique ever:

Prof. Matt Masten (Duke University)Regression.

The methodological roadmap is as follows:

- Group Comparison (t-Test and Chi-squared test)
- Covariance
- Correlation
- Simple Linear Regression
- Dummy explanatory variable
- Continuous explanatory variable
- Categorical explanatory variable

- Multiple Linear Regression
- Parallel Slopes (Dummy + Continuous)
- Interaction Effects
- Continuous * Categorical
- Continuous * Continuous

- Marginal Effects

- Logistic Regression
- Fixed Effects Regression
- First difference
- Time demeaning

- Least Square Dummy Variable

These methods are our **tools to answer intriguing** **questions**.