Content in this notebook

generate your synthetic data → obs
do some basic analysis of that data using a random forest → model
create your hypothesis (causal graph) about your data using the networkx package → causal_model
compute causal estimand, causal effect and potential refutations using the dowhy package

References

References for the dowhy package:

Documentation: here
GitHub: here

Material which helped my introductory reading so far by :

degeneratestate.org: Pt I - potential outcomes (contains definitions of ATE, ATT and ATC, among other things)
degeneratestate.org: Pt II - causal graphs & the backdoor criterion
this youtube video on backdoor paths
Rubin et al. 2005 on causal inference and potential outcomes
wiki article on the average treatment effect (ATE)
this wiki article on propensity score matching
this wiki article on regression discontinuity design

Open questions

How is matching or grouping done in categorical vs continuous treatment variable cases?
How to visualize ATT, ATC and ATE and the reasoning for their potential different applications?
Could the code easily be adjusted to extend the regression using arbitrary scikit-learn methods?
Why is logistic regression used to compute the propensity score and not other methods?

Generating data → `obs`

Data generating model: $ Y = a \cdot X_0 ^ b + c + X_1 $

n = 1000
a,b,c = 1.5, 1., 0
target = 'Y'
obs = pd.DataFrame(columns=['X0', 'X1', target])
obs['X0'] = stats.uniform.rvs(loc=-1, scale=2, size=n)
# obs['X1'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs['X1'] = stats.uniform.rvs(loc=-.5, scale=1, size=n)
obs[target] = a * obs.X0 ** b + c + obs.X1

obs.head()

Data generating model: $Y = a \cdot (X_0 \cdot X_1)^b + X_2^c + d$

n = 1000
a,b,c,d = 1.5, 1., 1., 0.
target = 'Y'
obs = pd.DataFrame(columns=['X0', 'X1', 'X2', target])
obs['X0'] = stats.uniform.rvs(loc=-1, scale=2, size=n)
obs['X1'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs['X2'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs[target] = a * (obs.X0 * obs.X1) ** b + obs.X1 ** c + d

obs.head()

Data generating model: $ Y = V \cdot \beta + W \cdot c_2 + X \cdot c_e \cdot V $

with $$t \equiv \varepsilon + W \cdot c_1 + Z \cdot c_z$$

$V =$ treatment
$Y =$ outcome
$W =$ common cause (links treatment and outcome)
$X =$ effect modifiers (only affect the outcome)
$Z =$ instrumental variables (only affect the outcome through the treatments) → examples

n = 1000
n_common_causes = 2
n_instruments = 3
n_eff_mods = 2
n_treatments = 1
beta = 1  # max random value
target = 'Y'

%%time
cc = CommonCauses.get_obs(n, n_common_causes)
cc.obs.head()

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 7.69 ms

%%time
ins = Instruments.get_obs(n, n_instruments)
ins.obs.head()

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 2.63 ms

%%time
em = EffectModifiers.get_obs(n, n_eff_mods)
em.obs.head()

CPU times: user 15.6 ms, sys: 0 ns, total: 15.6 ms
Wall time: 3.52 ms

stochastically_convert_to_binary(.5)

array([1])

%%time
treat = Treatments.get_obs(n, n_treatments, cc, ins, beta, treatment_is_binary=True)
treat.obs.head()

CPU times: user 62.5 ms, sys: 0 ns, total: 62.5 ms
Wall time: 63.5 ms

outcome_is_binary = True
out = Outcomes.get_obs(treat, cc, em, outcome_is_binary=outcome_is_binary)
out.obs.head()

obs = pd.concat((treat.obs, cc.obs, em.obs, ins.obs, out.obs), axis=1)
obs.head()

Visualizing `obs`

obs.describe()

Covariation of the target variable with all other variables individually as observed (still being all confounded and what not)

plot_target_vs_rest(obs)

Distribution of each individual variables observed values

plot_var_hists(obs)

Looking at correlations to get an idea of what interaction might be important

show_correlations(obs)

Random Forest model: feature importance & partial dependency

X, y, not_target = get_Xy(obs, target=target)
model = ensemble.RandomForestRegressor(n_estimators=100, max_features='sqrt')
model.fit(X,y)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='sqrt', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

Getting a feeling for the quality of the model

get_model_feel(model, obs, target=target)

Looking at which features are particularly important for the prediction

fi_scores = get_feature_importance(model, obs, target=target,
                                   metric=metrics.mean_squared_error)
fi_scores.head()

Looking at partial dependencies

%%time
part_deps = get_partial_dependencies(model, obs, target=target,
                                     max_num_obs=100, 
                                     max_num_ys=10)

CPU times: user 1min 21s, sys: 1.31 s, total: 1min 22s
Wall time: 1min 23s

%%time
plot_partial_dependencies(part_deps, target=target)

CPU times: user 10.5 s, sys: 3.67 s, total: 14.2 s
Wall time: 8.49 s

Generating a graphical model → `g`

Causal graph: only $X_i \rightarrow Y$

gg = GraphGenerator(obs)
print(f'target var: {gg.target}, not target vars: {", ".join(gg.not_targets)}')

target var: Y, not target vars: V0, W0, W1, X0, X1, Z0, Z1, Z2

g = gg.get_only_Xi_to_Y()

g = gg.get_Xi_to_Y_with_ccs_and_such()

layout = nx.random_layout(g)

g.edges

OutEdgeView([('W0', 'V0'), ('W0', 'Y'), ('V0', 'Y'), ('W1', 'V0'), ('W1', 'Y'), ('Z0', 'V0'), ('Z1', 'V0'), ('Z2', 'V0'), ('X0', 'Y'), ('X1', 'Y')])

gg.vis_g(g, kind='spiral')

Using the graphical model `g` for causal estimates

First let's cast the graph into a form consumable by dowhy.CausalModel

gml = gg.get_gml(g); gml

'graph [  directed 1  node [    id 0    label "W0"  ]  node [    id 1    label "V0"  ]  node [    id 2    label "W1"  ]  node [    id 3    label "Y"  ]  node [    id 4    label "Z0"  ]  node [    id 5    label "Z1"  ]  node [    id 6    label "Z2"  ]  node [    id 7    label "X0"  ]  node [    id 8    label "X1"  ]  edge [    source 0    target 1  ]  edge [    source 0    target 3  ]  edge [    source 1    target 3  ]  edge [    source 2    target 1  ]  edge [    source 2    target 3  ]  edge [    source 4    target 1  ]  edge [    source 5    target 1  ]  edge [    source 6    target 1  ]  edge [    source 7    target 3  ]  edge [    source 8    target 3  ]]'

Instantiating the causal_model

treatment = ['V0', ] # 'X1'
causal_model = dw.CausalModel(data=obs,  treatment=treatment, 
                       outcome=target, graph=gml)

INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['V0'] on outcome ['Y']

Sanity checking that the model looks actually like what we had in mind with g, noticing that a node, U, was added between treatment and outcome, which is an unobserved confounder

causal_model.view_model()

WARNING:dowhy.causal_graph:Warning: Pygraphviz cannot be loaded. Check that graphviz and pygraphviz are installed.
INFO:dowhy.causal_graph:Using Matplotlib for plotting

Generating the expression for the estimand

identified_estimand = causal_model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U', 'W0', 'W1']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z0', 'Z1', 'Z2']

Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                        
─────(Expectation(Y|W0,W1))
d[V₀]                      
Estimand assumption 1, Unconfoundedness: If U→{V0} and U→Y then P(Y|V0,W0,W1,U) = P(Y|V0,W0,W1)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(Y, [Z0, Z1, Z2])*Derivative([V0], [Z0, Z1, Z2])**(-1))
Estimand assumption 1, As-if-random: If U→→Y then ¬(U →→{Z0,Z1,Z2})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1,Z2}→{V0}, then ¬({Z0,Z1,Z2}→Y)

Estimating the effect between treatment and control. For this we need to define what the treatment_value and control_value are, respectively. This is what for each group the treatment, e.g. X0, will be set to. If it is binary then this can be translated into "received a drug" or "received a placebo" by, for example, using 0 and 1 respectively.

How dowhy.CausalModel handles the treatment input is specified with the second part in the method_name argument, the part after the dot. For example, when the treatement is of a categorical nature you can use 'propensity_score_stratification', or when it is of a continuous nature you can use 'linear_regression'.

Still unclear:

Difference between 'regression_discontinuity' and 'linear_regression'?
Difference between 'instrumental_variable' and 'backdoor'?

# method_name = 'backdoor.propensity_score_stratification'
# method_name = 'backdoor.propensity_score_matching'
# method_name = 'backdoor.propensity_score_weighting'
# method_name = 'backdoor.linear_regression'
# method_name = 'iv.instrumental_variable'
# method_name = 'iv.regression_discontinuity'
method_name = 'iv.linear_regression'

effect_kwargs = dict(
    method_name=method_name,
    control_value = 0,
    treatment_value = 1,
    target_units = 'ate',
    test_significance = True
)
causal_estimate = causal_model.estimate_effect(identified_estimand,
                                        **effect_kwargs)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Y~V0+W0+W1+V0*X0+V0*X1

print(causal_estimate)

*** Causal Estimate ***

## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                        
─────(Expectation(Y|W0,W1))
d[V₀]                      
Estimand assumption 1, Unconfoundedness: If U→{V0} and U→Y then P(Y|V0,W0,W1,U) = P(Y|V0,W0,W1)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(Y, [Z0, Z1, Z2])*Derivative([V0], [Z0, Z1, Z2])**(-1))
Estimand assumption 1, As-if-random: If U→→Y then ¬(U →→{Z0,Z1,Z2})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1,Z2}→{V0}, then ¬({Z0,Z1,Z2}→Y)

## Realized estimand
b: Y~V0+W0+W1+V0*X0+V0*X1
## Estimate
Value: 0.9999999999999973

## Statistical Significance
p-value: <0.001

For the simple linear case using method_name = backdoor.linear_regression' with control_value = 0 and treatment_value = 1 we get the slope of the (linear) relationship between treatment and outcome, which is ~1.5, just as the value used to generate the data.

If however the exponent $b$ of $X_0$ in the data generating model is changed from 1 → 2 we see that suddenly the estimated value is bonkers.

Trying to refute the observed causal effect. I see this as sanity checking the causal graph created to capture the observed data by trying to poke holes into it in various ways, like adding potentially spurious correlations with a new variable or replacing the treatment with noise.

method_name:

"random_common_cause": Adding a randomly-generated confounder
"add_unobserved_common_cause": Adding a confounder that is associated with both treatment and outcome
"placebo_treatment_refuter": Replacing the treatment with a placebo (random) variable)
"data_subset_refuter": Removing a random subset of the data

# method_name = 'random_common_cause'
# method_name = 'add_unobserved_common_cause'
method_name = 'placebo_treatment_refuter'
# method_name = 'data_subset_refuter'

refute_kwargs = dict(
    method_name=method_name,
    placebo_type = "permute",  # relevant for placebo refutation
    subset_fraction = .9,  # relevant for subset refutation
)

refute_res = causal_model.refute_estimate(identified_estimand, 
                                   causal_estimate, 
                                   **refute_kwargs)

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Y~placebo+W0+W1+placebo*X0+placebo*X1

print(refute_res)

Refute: Use a Placebo Treatment
Estimated effect:(0.9999999999999973,)
New effect:(0.21013674170611385,)

	X0	X1	Y
0	0.765089	0.029086	1.176720
1	0.415144	0.465493	1.088209
2	0.916878	-0.080276	1.295041
3	-0.821110	0.430448	-0.801217
4	0.573608	0.237232	1.097644

	X0	X1	X2	Y
0	-0.150942	0.013142	-0.008870	0.010167
1	0.702410	0.036707	0.016707	0.075383
2	-0.646335	0.104335	0.003625	0.003182
3	-0.275542	0.154818	-0.022776	0.090830
4	0.061228	0.119853	-0.019778	0.130860

	W0	W1
0	0.736602	0.281994
1	-0.492251	1.485185
2	0.375947	1.623624
3	0.483833	0.541259
4	-1.465136	-0.438113

	Z0	Z1	Z2
0	1.0	0.032661	1.0
1	1.0	0.436289	1.0
2	1.0	0.218769	1.0
3	0.0	0.149460	1.0
4	1.0	0.921593	1.0

	X0	X1
0	-0.028788	0.438408
1	1.899796	-0.335811
2	1.119562	-1.250247
3	-0.332850	1.308470
4	1.062409	0.357076

	V0	W0	W1	X0	X1	Z0	Z1	Z2	Y
0	2.373724	0.736602	0.281994	-0.028788	0.438408	1.0	0.032661	1.0	1
1	4.615382	-0.492251	1.485185	1.899796	-0.335811	1.0	0.436289	1.0	1
2	2.821714	0.375947	1.623624	1.119562	-1.250247	1.0	0.218769	1.0	1
3	-0.215520	0.483833	0.541259	-0.332850	1.308470	0.0	0.149460	1.0	0
4	2.743184	-1.465136	-0.438113	1.062409	0.357076	1.0	0.921593	1.0	1

	V0	W0	W1	X0	X1	Z0	Z1	Z2	Y
count	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000
mean	1.465296	0.100475	-0.130542	0.008895	-0.585974	0.805000	0.507214	0.238000	1.179813
std	1.204057	0.992407	1.037773	0.988501	1.020015	0.396399	0.282922	0.426072	1.598521
min	-2.339415	-3.267660	-3.571542	-2.874415	-4.047959	0.000000	0.002026	0.000000	-3.806755
25%	0.608202	-0.600980	-0.815770	-0.658849	-1.275486	1.000000	0.260859	0.000000	0.096035
50%	1.472026	0.114826	-0.095155	0.023061	-0.609953	1.000000	0.512605	0.000000	0.879824
75%	2.263248	0.781707	0.540700	0.711603	0.082860	1.000000	0.744506	0.000000	1.984080
max	5.860710	2.949236	2.812134	2.933076	2.726658	1.000000	0.995432	1.000000	8.964875

	variable	feature_importance
0	V0	1.714640
3	X0	0.932858
4	X1	0.792039
1	W0	0.298732
2	W1	0.182284

Linear models & causaling

Generating data → obs

Data generating model: $ Y = a \cdot X_0 ^ b + c + X_1 $

Data generating model: $Y = a \cdot (X_0 \cdot X_1)^b + X_2^c + d$

Data generating model: $ Y = V \cdot \beta + W \cdot c_2 + X \cdot c_e \cdot V $

class GenVars[source]

class CommonCauses[source]

class Instruments[source]

class EffectModifiers[source]

class Treatments[source]

initialize[source]

generate[source]

get_obs[source]

initialize[source]

get_obs[source]

generate[source]

initialize[source]

get_obs[source]

generate[source]

stochastically_convert_to_binary[source]

initialize[source]

generate[source]

get_obs[source]

class Outcomes[source]

Visualizing obs

plot_target_vs_rest[source]

plot_var_hists[source]

show_correlations[source]

Random Forest model: feature importance & partial dependency

get_Xy[source]

get_model_feel[source]

get_feature_importance[source]

get_partial_dependencies[source]

plot_partial_dependencies[source]

Generating a graphical model → g

class GraphGenerator[source]

Causal graph: only $X_i \rightarrow Y$

get_only_Xi_to_Y[source]

get_Xi_to_Y_with_ccs_and_such[source]

vis_g[source]

Using the graphical model g for causal estimates

get_gml[source]

Generating data → `obs`

`class` `GenVars`[source]

`class` `CommonCauses`[source]

`class` `Instruments`[source]

`class` `EffectModifiers`[source]

`class` `Treatments`[source]

`initialize`[source]

`generate`[source]

`get_obs`[source]

`initialize`[source]

`get_obs`[source]

`generate`[source]

`initialize`[source]

`get_obs`[source]

`generate`[source]

`stochastically_convert_to_binary`[source]

`initialize`[source]

`generate`[source]

`get_obs`[source]

`class` `Outcomes`[source]

Visualizing `obs`

`plot_target_vs_rest`[source]

`plot_var_hists`[source]

`show_correlations`[source]

`get_Xy`[source]

`get_model_feel`[source]

`get_feature_importance`[source]

`get_partial_dependencies`[source]

`plot_partial_dependencies`[source]

Generating a graphical model → `g`

`class` `GraphGenerator`[source]

`get_only_Xi_to_Y`[source]

`get_Xi_to_Y_with_ccs_and_such`[source]

`vis_g`[source]

Using the graphical model `g` for causal estimates

`get_gml`[source]