Causal Inference

For Science and Decisions

Sean J. Taylor / @seanjtaylor

http://seanjtaylor.github.io/CausalInference/

Introduction

NYU Stern

5th year PhD candidate at NYU Stern

interests

  • Social influence
  • Causal inference and web/mobile experimentation
  • Bayesian modeling
  • Prediction

toolbox

Python R

D3 Stan

Outline

  • Associations and Causal Graphs
  • Why do causal inference?
  • Linear regression
  • Potential outcomes model
  • Randomized experiments
  • Observational data analysis

Obligatory

xkcd

Associations

\( (X_i, Y_i) \)

  • (smoking, cancer)
  • (running the ball, winning games)
  • (facebook fan, makes purchase)
  • (shown ad, makes purchase)
  • (toothbrushing, heart disease)
  • (SAT scores, success in college)

Possible Relationships

  • \( \Pr( Y_i \mid X_i ) = \Pr( Y_i ) \) (independence)
  • \( \Pr( Y_i \mid X_i ) \neq \Pr( Y_i ) \) (dependence)

Dependence is useful. It's how we build predictive models.

Possible Causal Relationships

  1. \( X_i \leftarrow Y_i \)
  2. \( X_i \rightarrow Y_i \)
  3. No causal relationship (but could still be correlated!)

Notice how a dependency doesn't say which direction the causal relationship is.

Correlation without Causation

Introducing \( Z_i \)

  • \( X_i \rightarrow Z_i \rightarrow Y_i \) (chain)
  • \( X_i \leftarrow Z_i \leftarrow Y_i \) (chain)
  • \( X_i \leftarrow Z_i \rightarrow Y_i \) (fork)
  • \( X_i \rightarrow Z_i \leftarrow Y_i \) (collider)

Chains

\( X_i \) doesn't cause \( Y_i \), it causes the cause.

Example: McDonald's opening doesn't cause obesity, it causes overeating... which causes obesity

Forks

\( Z_i \) causes both \( X_i \) and \(Y_i \).

Example: High natural ability causes good SAT scores and success in college.

Colliders

\( X_i \) and \(Y_i \) are independent, but not if we condition on \( Z_i \).

Example: Your car won't start either because you ran out of gas or the battery is dead. These are independent events, but if you condition on the car not starting, they are anti-correlated.

Confounding

  • (smoking, cancer) \( \rightarrow \) genetics
  • (running the ball, winning games) \( \rightarrow \) having a lead
  • (facebook fan, makes purchase) \( \rightarrow \) brand loyalty
  • (shown ad, makes purchase) \( \rightarrow \) intent to buy
  • (toothbrushing, heart disease) \( \rightarrow \) health consciousness
  • (SAT scores, success in college) \( \rightarrow \) genetic intelligence

Bigger Example

Why Causal Inference?

  1. Science: Why did something happen?
  2. Decisions: What will happen if I change something?

Science

  1. Associations are always more interesting when they're causal.
  2. Understanding a phenomenon is different than predicting it.

http://idlewords.com/2010/03/scott_and_scurvy.htm

Decisions

  • quit smoking?
  • run the ball more?
  • try to recruit Facebook fans for my application?
  • purchase costly advertisements?
  • brush your teeth? :)
  • take an SAT prep class?

Decisions are different than predictions

Prediction

Estimate: \( \Pr(Y_i \mid X_i) \)

Predict: \( \Pr(Y_i \mid X_i = 1) \)

Decision

Estimate: \( \Pr(Y_i \mid do(X_i = 1)) \neq \Pr(Y_i \mid X_i = 1) \)

Set: \( X_i = 1 \) for some units \( i \).

A Social Science Problem

Causality is easy in natural sciences.

Molecules, cells, animals, plants are exchangeable!

There is always a great deal we don't observe about people.

Linear Regression

\( Y_i = \beta X_i + \epsilon_i \)

Often assumes:

\( X_i \rightarrow Y_i \leftarrow \epsilon_i \)

Linear Regression Bias

\( \beta \) is usually a biased estimate of the causal effect.

This is because \( \epsilon_i \) is not exogenous!

Often easy to think of ways that \( \epsilon_i \rightarrow X_i \).

Add all the controls you want

You can always argue that there exists some component of \( \epsilon_i \) that affects \( X_i \) and driving the outcome.

Potential Outcomes Framework

An alternative/complement to causal graphs

\( Y_i(0) \) is the outcome of \( i \) under no treatment.

\( Y_i(1) \) is the outcome of \( i \) under treatment.

Example: god-mode

Example: science-mode

Problem: we can only ever observe one of these values per row.

Best Solution: Random Assignment

Random Assignment

Key idea: enforce that \( X_i \) is exogenous (has no parents) by assigning it randomly

Prevents any confounding from being possible.

The gold standard in clinical trials and policy experiments.

Another Solution: Matching

Key idea: make comparisons between observations that are as similar as possible on a boatload of observable dimensions.

Randomized Experiment Example

Muchnik, Aral, and Taylor (2013)

“Social Influence Bias: A Randomized Experiment”

Propensity Score Matching Example

Aral, Muchnik, and Sundararajan (2009)

“Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks”

\( X_i \) : number of adopter friends

\( Y_i \) : whether the user adopts

The problem:

Matching

  • For every user with an adopter friend, find another user in the population.
  • Make sure they are as similar as possible on every characteristic you can measure.
  • Discard users for whom you cannot find a suitable match.
  • KEY:Find characteristics which are good proxies for the latent confounding variables.

All treated adopters (filled circles) and the number of treated adopters that can be explained by homophily (open circles) per day and cumulatively over time.g

Shalizi's Notes