Playground unfinished for causal modelling on various kinds of generated synthetic data and causal graphs.

Content in this notebook

  • generate your synthetic data → obs
  • do some basic analysis of that data using a random forest → model
  • create your hypothesis (causal graph) about your data using the networkx package → causal_model
  • compute causal estimand, causal effect and potential refutations using the dowhy package

References

References for the dowhy package:

Material which helped my introductory reading so far by :

Open questions

  • How is matching or grouping done in categorical vs continuous treatment variable cases?
  • How to visualize ATT, ATC and ATE and the reasoning for their potential different applications?
  • Could the code easily be adjusted to extend the regression using arbitrary scikit-learn methods?
  • Why is logistic regression used to compute the propensity score and not other methods?

Generating data → obs

Data generating model: $ Y = a \cdot X_0 ^ b + c + X_1 $

n = 1000
a,b,c = 1.5, 1., 0
target = 'Y'
obs = pd.DataFrame(columns=['X0', 'X1', target])
obs['X0'] = stats.uniform.rvs(loc=-1, scale=2, size=n)
# obs['X1'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs['X1'] = stats.uniform.rvs(loc=-.5, scale=1, size=n)
obs[target] = a * obs.X0 ** b + c + obs.X1

obs.head()
X0 X1 Y
0 0.765089 0.029086 1.176720
1 0.415144 0.465493 1.088209
2 0.916878 -0.080276 1.295041
3 -0.821110 0.430448 -0.801217
4 0.573608 0.237232 1.097644

Data generating model: $Y = a \cdot (X_0 \cdot X_1)^b + X_2^c + d$

n = 1000
a,b,c,d = 1.5, 1., 1., 0.
target = 'Y'
obs = pd.DataFrame(columns=['X0', 'X1', 'X2', target])
obs['X0'] = stats.uniform.rvs(loc=-1, scale=2, size=n)
obs['X1'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs['X2'] = stats.norm.rvs(loc=0, scale=.1, size=n)
obs[target] = a * (obs.X0 * obs.X1) ** b + obs.X1 ** c + d

obs.head()
X0 X1 X2 Y
0 -0.150942 0.013142 -0.008870 0.010167
1 0.702410 0.036707 0.016707 0.075383
2 -0.646335 0.104335 0.003625 0.003182
3 -0.275542 0.154818 -0.022776 0.090830
4 0.061228 0.119853 -0.019778 0.130860

Data generating model: $ Y = V \cdot \beta + W \cdot c_2 + X \cdot c_e \cdot V $

with $$t \equiv \varepsilon + W \cdot c_1 + Z \cdot c_z$$

  • $V =$ treatment
  • $Y =$ outcome
  • $W =$ common cause (links treatment and outcome)
  • $X =$ effect modifiers (only affect the outcome)
  • $Z =$ instrumental variables (only affect the outcome through the treatments) → examples

class GenVars[source]

GenVars(name:str, n_vars:int, **kwargs)

class CommonCauses[source]

CommonCauses(name:str='W', n_common_causes:int=1, rv_mean:rv_continuous=<scipy.stats._continuous_distns.norm_gen object at 0x7fc832b67710>, rv_mean_kwargs:dict=None) :: GenVars

class Instruments[source]

Instruments(name:str='Z', n_instruments:int=1, rv_p:rv_continuous=<scipy.stats._continuous_distns.uniform_gen object at 0x7fc8325c4950>, rv_p_kwargs:dict=None) :: GenVars

class EffectModifiers[source]

EffectModifiers(name:str='X', n_eff_mods:int=1, rv_mean:rv_continuous=<scipy.stats._continuous_distns.uniform_gen object at 0x7fc8325c4950>, rv_mean_kwargs:dict=None) :: GenVars

class Treatments[source]

Treatments(name:str='V', n_treatments:int=1, rv:rv_continuous=<scipy.stats._continuous_distns.norm_gen object at 0x7fc832b67710>, rv_kwargs:dict=None, cc:CommonCauses=None, ins:Instruments=None, beta:Union[float, List[int], List[float], ndarray]=10) :: GenVars

n = 1000
n_common_causes = 2
n_instruments = 3
n_eff_mods = 2
n_treatments = 1
beta = 1  # max random value
target = 'Y'

initialize[source]

initialize(rv:rv_continuous, rv_kwargs:dict, cc:CommonCauses=None, ins:Instruments=None, beta:Union[float, List[int], List[float], ndarray]=10)

generate[source]

generate(n:int, treatment_is_binary:bool=False)

get_obs[source]

get_obs(n:int, n_treatments:int, cc:CommonCauses, ins:Instruments, beta:Union[float, List[int], List[float], ndarray], treatment_is_binary:bool=False)

%%time
cc = CommonCauses.get_obs(n, n_common_causes)
cc.obs.head()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 7.69 ms
W0 W1
0 0.736602 0.281994
1 -0.492251 1.485185
2 0.375947 1.623624
3 0.483833 0.541259
4 -1.465136 -0.438113

initialize[source]

initialize(rv:rv_continuous, rv_kwargs:dict, cc:CommonCauses=None, ins:Instruments=None, beta:Union[float, List[int], List[float], ndarray]=10)

get_obs[source]

get_obs(n:int, n_treatments:int, cc:CommonCauses, ins:Instruments, beta:Union[float, List[int], List[float], ndarray], treatment_is_binary:bool=False)

generate[source]

generate(n:int, treatment_is_binary:bool=False)

%%time
ins = Instruments.get_obs(n, n_instruments)
ins.obs.head()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 2.63 ms
Z0 Z1 Z2
0 1.0 0.032661 1.0
1 1.0 0.436289 1.0
2 1.0 0.218769 1.0
3 0.0 0.149460 1.0
4 1.0 0.921593 1.0

initialize[source]

initialize(rv:rv_continuous, rv_kwargs:dict, cc:CommonCauses=None, ins:Instruments=None, beta:Union[float, List[int], List[float], ndarray]=10)

get_obs[source]

get_obs(n:int, n_treatments:int, cc:CommonCauses, ins:Instruments, beta:Union[float, List[int], List[float], ndarray], treatment_is_binary:bool=False)

generate[source]

generate(n:int, treatment_is_binary:bool=False)

%%time
em = EffectModifiers.get_obs(n, n_eff_mods)
em.obs.head()
CPU times: user 15.6 ms, sys: 0 ns, total: 15.6 ms
Wall time: 3.52 ms
X0 X1
0 -0.028788 0.438408
1 1.899796 -0.335811
2 1.119562 -1.250247
3 -0.332850 1.308470
4 1.062409 0.357076

stochastically_convert_to_binary[source]

stochastically_convert_to_binary(x:float)

stochastically_convert_to_binary(.5)
array([1])

initialize[source]

initialize(rv:rv_continuous, rv_kwargs:dict, cc:CommonCauses=None, ins:Instruments=None, beta:Union[float, List[int], List[float], ndarray]=10)

generate[source]

generate(n:int, treatment_is_binary:bool=False)

get_obs[source]

get_obs(n:int, n_treatments:int, cc:CommonCauses, ins:Instruments, beta:Union[float, List[int], List[float], ndarray], treatment_is_binary:bool=False)

%%time
treat = Treatments.get_obs(n, n_treatments, cc, ins, beta, treatment_is_binary=True)
treat.obs.head()
CPU times: user 62.5 ms, sys: 0 ns, total: 62.5 ms
Wall time: 63.5 ms
V0
0 1
1 1
2 1
3 1
4 1

class Outcomes[source]

Outcomes(name:str='Y')

outcome_is_binary = True
out = Outcomes.get_obs(treat, cc, em, outcome_is_binary=outcome_is_binary)
out.obs.head()
Y
0 0
1 0
2 0
3 0
4 0
obs = pd.concat((treat.obs, cc.obs, em.obs, ins.obs, out.obs), axis=1)
obs.head()
V0 W0 W1 X0 X1 Z0 Z1 Z2 Y
0 2.373724 0.736602 0.281994 -0.028788 0.438408 1.0 0.032661 1.0 1
1 4.615382 -0.492251 1.485185 1.899796 -0.335811 1.0 0.436289 1.0 1
2 2.821714 0.375947 1.623624 1.119562 -1.250247 1.0 0.218769 1.0 1
3 -0.215520 0.483833 0.541259 -0.332850 1.308470 0.0 0.149460 1.0 0
4 2.743184 -1.465136 -0.438113 1.062409 0.357076 1.0 0.921593 1.0 1

Visualizing obs

obs.describe()
V0 W0 W1 X0 X1 Z0 Z1 Z2 Y
count 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000
mean 1.465296 0.100475 -0.130542 0.008895 -0.585974 0.805000 0.507214 0.238000 1.179813
std 1.204057 0.992407 1.037773 0.988501 1.020015 0.396399 0.282922 0.426072 1.598521
min -2.339415 -3.267660 -3.571542 -2.874415 -4.047959 0.000000 0.002026 0.000000 -3.806755
25% 0.608202 -0.600980 -0.815770 -0.658849 -1.275486 1.000000 0.260859 0.000000 0.096035
50% 1.472026 0.114826 -0.095155 0.023061 -0.609953 1.000000 0.512605 0.000000 0.879824
75% 2.263248 0.781707 0.540700 0.711603 0.082860 1.000000 0.744506 0.000000 1.984080
max 5.860710 2.949236 2.812134 2.933076 2.726658 1.000000 0.995432 1.000000 8.964875

Covariation of the target variable with all other variables individually as observed (still being all confounded and what not)

plot_target_vs_rest[source]

plot_target_vs_rest(obs:DataFrame, target:str='Y')

plot_target_vs_rest(obs)

Distribution of each individual variables observed values

plot_var_hists[source]

plot_var_hists(obs:DataFrame, bins:int=50)

plot_var_hists(obs)

Looking at correlations to get an idea of what interaction might be important

show_correlations[source]

show_correlations(obs:DataFrame, method:str='spearman')

show_correlations(obs)

Random Forest model: feature importance & partial dependency

get_Xy[source]

get_Xy(obs:DataFrame, target:str='Y')

X, y, not_target = get_Xy(obs, target=target)
model = ensemble.RandomForestRegressor(n_estimators=100, max_features='sqrt')
model.fit(X,y)
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='sqrt', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

Getting a feeling for the quality of the model

get_model_feel[source]

get_model_feel(model, obs:DataFrame, target:str='Y', bins:int=50)

get_model_feel(model, obs, target=target)

Looking at which features are particularly important for the prediction

get_feature_importance[source]

get_feature_importance(m, obs:DataFrame, target:str='Y', metric:callable='mean_squared_error')

fi_scores = get_feature_importance(model, obs, target=target,
                                   metric=metrics.mean_squared_error)
fi_scores.head()
variable feature_importance
0 V0 1.714640
3 X0 0.932858
4 X1 0.792039
1 W0 0.298732
2 W1 0.182284

Looking at partial dependencies

get_partial_dependencies[source]

get_partial_dependencies(model, obs:DataFrame, target:str='Y', max_num_obs:int=100, max_num_ys:int=10)

%%time
part_deps = get_partial_dependencies(model, obs, target=target,
                                     max_num_obs=100, 
                                     max_num_ys=10)
CPU times: user 1min 21s, sys: 1.31 s, total: 1min 22s
Wall time: 1min 23s

plot_partial_dependencies[source]

plot_partial_dependencies(part_deps:DataFrame, target:str='Y')

%%time
plot_partial_dependencies(part_deps, target=target)
CPU times: user 10.5 s, sys: 3.67 s, total: 14.2 s
Wall time: 8.49 s

Generating a graphical model → g

class GraphGenerator[source]

GraphGenerator(obs:DataFrame, target:str='Y')

Generates some specific directed graphs for obs

Causal graph: only $X_i \rightarrow Y$

get_only_Xi_to_Y[source]

get_only_Xi_to_Y()

gg = GraphGenerator(obs)
print(f'target var: {gg.target}, not target vars: {", ".join(gg.not_targets)}')
target var: Y, not target vars: V0, W0, W1, X0, X1, Z0, Z1, Z2
g = gg.get_only_Xi_to_Y()

get_Xi_to_Y_with_ccs_and_such[source]

get_Xi_to_Y_with_ccs_and_such(common_cause='W', effect_modifier='X', treatment='V', instrument='Z')

common cause → treatment common cause → outcome/target treatment → outcome/target instrument → treatment effect modifier → outcome/target

g = gg.get_Xi_to_Y_with_ccs_and_such()
layout = nx.random_layout(g)

vis_g[source]

vis_g(g:DiGraph, kind:str='spectral')

g.edges
OutEdgeView([('W0', 'V0'), ('W0', 'Y'), ('V0', 'Y'), ('W1', 'V0'), ('W1', 'Y'), ('Z0', 'V0'), ('Z1', 'V0'), ('Z2', 'V0'), ('X0', 'Y'), ('X1', 'Y')])
gg.vis_g(g, kind='spiral')

Using the graphical model g for causal estimates

First let's cast the graph into a form consumable by dowhy.CausalModel

get_gml[source]

get_gml(g:DiGraph)

gml = gg.get_gml(g); gml
'graph [  directed 1  node [    id 0    label "W0"  ]  node [    id 1    label "V0"  ]  node [    id 2    label "W1"  ]  node [    id 3    label "Y"  ]  node [    id 4    label "Z0"  ]  node [    id 5    label "Z1"  ]  node [    id 6    label "Z2"  ]  node [    id 7    label "X0"  ]  node [    id 8    label "X1"  ]  edge [    source 0    target 1  ]  edge [    source 0    target 3  ]  edge [    source 1    target 3  ]  edge [    source 2    target 1  ]  edge [    source 2    target 3  ]  edge [    source 4    target 1  ]  edge [    source 5    target 1  ]  edge [    source 6    target 1  ]  edge [    source 7    target 3  ]  edge [    source 8    target 3  ]]'

Instantiating the causal_model

treatment = ['V0', ] # 'X1'
causal_model = dw.CausalModel(data=obs,  treatment=treatment, 
                       outcome=target, graph=gml)
INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['V0'] on outcome ['Y']

Sanity checking that the model looks actually like what we had in mind with g, noticing that a node, U, was added between treatment and outcome, which is an unobserved confounder

causal_model.view_model()
WARNING:dowhy.causal_graph:Warning: Pygraphviz cannot be loaded. Check that graphviz and pygraphviz are installed.
INFO:dowhy.causal_graph:Using Matplotlib for plotting

Generating the expression for the estimand

identified_estimand = causal_model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U', 'W0', 'W1']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z0', 'Z1', 'Z2']
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                        
─────(Expectation(Y|W0,W1))
d[V₀]                      
Estimand assumption 1, Unconfoundedness: If U→{V0} and U→Y then P(Y|V0,W0,W1,U) = P(Y|V0,W0,W1)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(Y, [Z0, Z1, Z2])*Derivative([V0], [Z0, Z1, Z2])**(-1))
Estimand assumption 1, As-if-random: If U→→Y then ¬(U →→{Z0,Z1,Z2})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1,Z2}→{V0}, then ¬({Z0,Z1,Z2}→Y)

Estimating the effect between treatment and control. For this we need to define what the treatment_value and control_value are, respectively. This is what for each group the treatment, e.g. X0, will be set to. If it is binary then this can be translated into "received a drug" or "received a placebo" by, for example, using 0 and 1 respectively.

How dowhy.CausalModel handles the treatment input is specified with the second part in the method_name argument, the part after the dot. For example, when the treatement is of a categorical nature you can use 'propensity_score_stratification', or when it is of a continuous nature you can use 'linear_regression'.

Still unclear:

  • Difference between 'regression_discontinuity' and 'linear_regression'?

  • Difference between 'instrumental_variable' and 'backdoor'?

# method_name = 'backdoor.propensity_score_stratification'
# method_name = 'backdoor.propensity_score_matching'
# method_name = 'backdoor.propensity_score_weighting'
# method_name = 'backdoor.linear_regression'
# method_name = 'iv.instrumental_variable'
# method_name = 'iv.regression_discontinuity'
method_name = 'iv.linear_regression'

effect_kwargs = dict(
    method_name=method_name,
    control_value = 0,
    treatment_value = 1,
    target_units = 'ate',
    test_significance = True
)
causal_estimate = causal_model.estimate_effect(identified_estimand,
                                        **effect_kwargs)
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Y~V0+W0+W1+V0*X0+V0*X1
print(causal_estimate)
*** Causal Estimate ***

## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d                        
─────(Expectation(Y|W0,W1))
d[V₀]                      
Estimand assumption 1, Unconfoundedness: If U→{V0} and U→Y then P(Y|V0,W0,W1,U) = P(Y|V0,W0,W1)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(Y, [Z0, Z1, Z2])*Derivative([V0], [Z0, Z1, Z2])**(-1))
Estimand assumption 1, As-if-random: If U→→Y then ¬(U →→{Z0,Z1,Z2})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1,Z2}→{V0}, then ¬({Z0,Z1,Z2}→Y)

## Realized estimand
b: Y~V0+W0+W1+V0*X0+V0*X1
## Estimate
Value: 0.9999999999999973

## Statistical Significance
p-value: <0.001

For the simple linear case using method_name = backdoor.linear_regression' with control_value = 0 and treatment_value = 1 we get the slope of the (linear) relationship between treatment and outcome, which is ~1.5, just as the value used to generate the data.

If however the exponent $b$ of $X_0$ in the data generating model is changed from 1 → 2 we see that suddenly the estimated value is bonkers.

Trying to refute the observed causal effect. I see this as sanity checking the causal graph created to capture the observed data by trying to poke holes into it in various ways, like adding potentially spurious correlations with a new variable or replacing the treatment with noise.

method_name:

  • "random_common_cause": Adding a randomly-generated confounder
  • "add_unobserved_common_cause": Adding a confounder that is associated with both treatment and outcome
  • "placebo_treatment_refuter": Replacing the treatment with a placebo (random) variable)
  • "data_subset_refuter": Removing a random subset of the data
# method_name = 'random_common_cause'
# method_name = 'add_unobserved_common_cause'
method_name = 'placebo_treatment_refuter'
# method_name = 'data_subset_refuter'

refute_kwargs = dict(
    method_name=method_name,
    placebo_type = "permute",  # relevant for placebo refutation
    subset_fraction = .9,  # relevant for subset refutation
)

refute_res = causal_model.refute_estimate(identified_estimand, 
                                   causal_estimate, 
                                   **refute_kwargs)
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: Y~placebo+W0+W1+placebo*X0+placebo*X1
print(refute_res)
Refute: Use a Placebo Treatment
Estimated effect:(0.9999999999999973,)
New effect:(0.21013674170611385,)