因果推断笔记——因果图建模之Uber开源的CausalML（十二）_uber 因果推断框架

1 CausalML 、 EconML、dowhy异同

相比拉私活，causalML主要是面向Uplift 的模型，econML更加全面一些。
1.1 econML 主要估计器

主要的Estimation Methods估计器：
Double Machine Learning (aka RLearner)
Dynamic Double Machine Learning
Causal Forests
Orthogonal Random Forests
Meta-Learners
Doubly Robust Learners
Orthogonal Instrumental Variables
Deep Instrumental Variables
Interpretability解释性方面：
Tree Interpreter of the CATE model
Policy Interpreter of the CATE model
SHAP values for the CATE model
Causal Model Selection and Cross-Validation 模型选择方面：
Causal model selection with the RScorer
First Stage Model Selection
1.2 CausalML 主要的估计器

Tree-based algorithms
- Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square [1]
- Uplift tree/random forests on Contextual Treatment Selection [2]
- Causal Tree [3] - Work-in-progress
Meta-learner algorithms
- S-learner [4]
- T-learner [4]
- X-learner [4]
- R-learner [5]
- Doubly Robust (DR) learner [6]
- TMLE learner [7]
Instrumental variables algorithms
- 2-Stage Least Squares (2SLS)
- Doubly Robust (DR) IV [8]
Neural-network-based algorithms
- CEVAE [9]
- DragonNet [10] - with causalml[tf] installation (see Installation)
1.3 dowhy的估计器

使用统计方法对表达式进行估计，识别之后的估计
「基于估计干预分配的方法」
- 基于倾向的分层（Propensity-based Stratification）
- 倾向得分匹配（Propensity Score Matching）
- 逆向倾向加权（Inverse Propensity Weighting）
「基于估计结果模型的方法」
- 线性回归（Linear Regression）
- 广义线性模型（Generalized Linear Models）
「基于工具变量等式的方法」
- 二元工具/Wald 估计器（Binary Instrument/Wald Estimator）
- 两阶段最小二乘法（Two-stage least squares）
- 非连续回归（Regression discontinuity）
「基于前门准则和一般中介的方法」
- 两层线性回归（Two-stage linear regression）
1.3 简单对比：dowhy \ causalml\ econml

可以看到econML比较全面，涵盖了DML、uplift model、IV、Doubly Robust等
dowhy比较基础，涵盖的的是psm、psm、psw、iv等，
causalML主要集中在uplift model，S-learner 、 T-learner、X-learner、DR等,Tree-based 算是多出的，还有几款NN-Based的
同样，解释性，econml / causalml款都有，涵盖shap / Tree Interpreter /Policy Interpreter
window10 倒腾了大半天，各种报错。。问题会围绕Microsoft visual c++14.0、MicrosoftVisualStudio 等，还有要安装tf会有些问题；还有Built的时候会有一些报错。。
整体使用起来，方便性不如econml，后面还会有一些报错。
所以没深究就放弃了，还是用云服务器比较省心：
$ git clone https://github.com/uber/causalml.git
$ cd causalml
$ pip install -r requirements.txt
$ pip install causalml
3 Quick Start
 
3.1 S, T, X, and R Learners估计ATE
 
这里有一个 问题，就是xgboost.XGBRegressor，在causalml默认是xgboost = 1.4.2版本的，但是这个版本会出现报错： 
AttributeError: type object 'cupy.core.core.broadcast' has no attribute '__reduce_cython__'
也就是这里需要注意，如果是cpu调用，需要Predictor = 'cpu_predictor'，但是，好像causalML封装的有些问题，from causalml.inference.meta import XGBTRegressor，所以一直会报上面的错，只能回退xgboost版本到1.3.1就正常了。 
 20211213更新一个神奇的bug
 xgboost.XGBClassifier( predictor = ‘cpu_predictor’) ，在1.3.1 版本下，第一次运行可以正常，第二次再次model.fit()的时候，还是会出现'__reduce_cython__'，
 笔者经过多轮尝试，回退到1.2.1，终于恢复正常
 奇怪的是，1.3.1版本下，xgboost.XGBRegressor 多轮运行是正常的，具体原因未知… 且在其他window系统下，XGBClassifier在1.3.3 也是正常的，非常奇怪。。 
from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor
from causalml.inference.meta import BaseRRegressor
from xgboost import XGBRegressor
from causalml.dataset import synthetic_data
y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)
lr = LRSRegressor()
te, lb, ub = lr.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Linear Regression): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
xg = XGBTRegressor(random_state=42)
te, lb, ub = xg.estimate_ate(X, treatment, y)
print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
nn = MLPTRegressor(hidden_layer_sizes=(10, 10),
                 learning_rate_init=.1,
                 early_stopping=True,
                 random_state=42)
te, lb, ub = nn.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Neural Network (MLP)): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
xl = BaseXRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub = xl.estimate_ate(X, treatment, y, e)
print('Average Treatment Effect (BaseXRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
rl = BaseRRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub =  rl.estimate_ate(X=X, p=e, treatment=treatment, y=y)
print('Average Treatment Effect (BaseRRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
这里 y - 因变量, X-协变量, treatment-干预, e - PS倾向性得分 
这里对应使用的估计器是： 
LRSRegressor —— Linear Regression
XGBTRegressor—— XGBoost
MLPTRegressor—— Neural Network (MLP)
BaseXRegressor—— BaseXRegressor using XGBoost
BaseRRegressor—— BaseRRegressor using XGBoost 
输出的结果， 
3.2 meta元学习估计ATE/ITE
 
这里参考官方文档：meta_learners_with_synthetic_data.ipynb
 当然，这里注意到该教程中干预是0/1二分类的，那么后面有一份类似的教程是，干预可以是multiple_treatment所以可参考：meta_learners_with_synthetic_data_multiple_treatment.ipynb 
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from xgboost import XGBRegressor
import warnings
from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
from causalml.match import NearestNeighborMatch, MatchOptimizer, create_table_one
from causalml.propensity import ElasticNetPropensityModel
from causalml.dataset import *
from causalml.metrics import *
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')
%matplotlib inline
import causalml
print(causalml.__version__)
生成数据： 
# Generate synthetic data using mode 1
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=8, sigma=1.0)
来看看众多的估计器： 
3.2.1 『ATE』S-Learning 单模型 - 线性模型
 
S-Learner using LinearRegression 
# Ready-to-use S-Learner using LinearRegression
learner_s = LRSRegressor()
ate_s = learner_s.estimate_ate(X=X, treatment=treatment, y=y)
print(ate_s)
print('ATE estimate: {:.03f}'.format(ate_s[0][0]))
print('ATE lower bound: {:.03f}'.format(ate_s[1][0]))
print('ATE upper bound: {:.03f}'.format(ate_s[2][0]))
输出ATE: 
(array([0.68844541]), array([0.64017928]), array([0.73671154]))
ATE estimate: 0.688
ATE lower bound: 0.640
ATE upper bound: 0.737
3.2.2 『ATE』双模型T-Learner + XGB回归
 
# Ready-to-use T-Learner using XGB
learner_t = XGBTRegressor()
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the ready-to-use XGBTRegressor class')
print(ate_t)
Using the ready-to-use XGBTRegressor class
(array([0.53355931]), array([0.50912018]), array([0.55799843]))
3.2.3 『ATE』Base Learner 基础模型
 
# Calling the Base Learner class and feeding in XGB
learner_t = BaseTRegressor(learner=XGBRegressor())
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseTRegressor class and using XGB (same result):')
print(ate_t)
# Calling the Base Learner class and feeding in LinearRegression
learner_t = BaseTRegressor(learner=LinearRegression())
ate_t = learner_t.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseTRegressor class and using Linear Regression (different result):')
print(ate_t)
有基础估计的，带上XGBRegressor回归 + LinearRegression线性模型
 输出为： 
Using the BaseTRegressor class and using XGB (same result):
(array([0.53355931]), array([0.50912018]), array([0.55799843]))
Using the BaseTRegressor class and using Linear Regression (different result):
(array([0.68904064]), array([0.64912657]), array([0.7289547]))
3.2.4 『ATE』含/不含倾向得分的X-Learner + 基础模型
 
X Learner without propensity score input 
# 前面两个是带倾向得分的
# X Learner with propensity score input
# Calling the Base Learner class and feeding in XGB
learner_x = BaseXRegressor(learner=XGBRegressor())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseXRegressor class and using XGB:')
print(ate_x)
# Calling the Base Learner class and feeding in LinearRegression
learner_x = BaseXRegressor(learner=LinearRegression())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('\nUsing the BaseXRegressor class and using Linear Regression:')
print(ate_x)
# 后面两个是不带倾向得分的结果
# X Learner without propensity score input
# Calling the Base Learner class and feeding in XGB
learner_x = BaseXRegressor(XGBRegressor())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseXRegressor class and using XGB without propensity score input:')
print(ate_x)
# Calling the Base Learner class and feeding in LinearRegression
learner_x = BaseXRegressor(learner=LinearRegression())
ate_x = learner_x.estimate_ate(X=X, treatment=treatment, y=y)
print('\nUsing the BaseXRegressor class and using Linear Regression without propensity score input:')
print(ate_x)
Using the BaseXRegressor class and using XGB:
(array([0.49851986]), array([0.47623108]), array([0.52080863]))
Using the BaseXRegressor class and using Linear Regression:
(array([0.67178423]), array([0.63091207]), array([0.7126564]))
其中e是额外计算的倾向得分 
3.2.5 『ATE』R Learner + 含/不含倾向得分
 
# R-Learnning 含倾向得分
# R Learner with propensity score input
# Calling the Base Learner class and feeding in XGB
learner_r = BaseRRegressor(learner=XGBRegressor())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseRRegressor class and using XGB:')
print(ate_r)
# Calling the Base Learner class and feeding in LinearRegression
learner_r = BaseRRegressor(learner=LinearRegression())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y, p=e)
print('Using the BaseRRegressor class and using Linear Regression:')
print(ate_r)
# R-Learnning 不含倾向得分
# R Learner without propensity score input
# Calling the Base Learner class and feeding in XGB
learner_r = BaseRRegressor(learner=XGBRegressor())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseRRegressor class and using XGB without propensity score input:')
print(ate_r)
# Calling the Base Learner class and feeding in LinearRegression
learner_r = BaseRRegressor(learner=LinearRegression())
ate_r = learner_r.estimate_ate(X=X, treatment=treatment, y=y)
print('Using the BaseRRegressor class and using Linear Regression without propensity score input:')
print(ate_r)
3.2.6 『ITE』S, T, X, R Learners求ITE/CATE
 
# S Learner
learner_s = LRSRegressor()
cate_s = learner_s.fit_predict(X=X, treatment=treatment, y=y)
# T Learner
learner_t = BaseTRegressor(learner=XGBRegressor())
cate_t = learner_t.fit_predict(X=X, treatment=treatment, y=y)
# X Learner with propensity score input
learner_x = BaseXRegressor(learner=XGBRegressor())
cate_x = learner_x.fit_predict(X=X, treatment=treatment, y=y, p=e)
# X Learner without propensity score input
learner_x_no_p = BaseXRegressor(learner=XGBRegressor())
cate_x_no_p = learner_x_no_p.fit_predict(X=X, treatment=treatment, y=y)
# R Learner with propensity score input 
learner_r = BaseRRegressor(learner=XGBRegressor())
cate_r = learner_r.fit_predict(X=X, treatment=treatment, y=y, p=e)
# R Learner without propensity score input
learner_r_no_p = BaseRRegressor(learner=XGBRegressor())
cate_r_no_p = learner_r_no_p.fit_predict(X=X, treatment=treatment, y=y)
可以看到求总体的ATE不同ITE主要是，model.fit_predict 而不是model.estimate_ate




    
 
画图对比一下： 
alpha=0.2
bins=30
plt.figure(figsize=(12,8))
plt.hist(cate_t, alpha=alpha, bins=bins, label='T Learner')
plt.hist(cate_x, alpha=alpha, bins=bins, label='X Learner')
plt.hist(cate_x_no_p, alpha=alpha, bins=bins, label='X Learner (no propensity score)')
plt.hist(cate_r, alpha=alpha, bins=bins, label='R Learner')
plt.hist(cate_r_no_p, alpha=alpha, bins=bins, label='R Learner (no propensity score)')
plt.vlines(cate_s[0], 0, plt.axes().get_ylim()[1], label='S Learner',
           linestyles='dotted', colors='green', linewidth=2)
plt.title('Distribution of CATE Predictions by Meta Learner')
plt.xlabel('Individual Treatment Effect (ITE/CATE)')
plt.ylabel('# of Samples')
_=plt.legend()

 模型输出的分布，几个模型基本分布是一致的 
3.3 Meta-Learner的评估精准性
 
3.3.1 MSE
 
Validating Meta-Learner Accuracy 
train_summary, validation_summary = get_synthetic_summary_holdout(simulate_nuisance_and_easy_treatment,
                                                                  n=10000,
                                                                  valid_size=0.2,
                                                                  k=10)

 从结果可以对比差异还是蛮大的，还有KL Divergence散度指标,注意，simulate_nuisance_and_easy_treatment是一个模拟数据的函数，get_synthetic_summary_holdout模拟之后得到的结论，所以这个函数主要是在模拟运行。
 如果有真实数据需要自己计算MSE / KL Divergence了。 
按误差再来画一个各模型的对比图： 
scatter_plot_summary_holdout(train_summary,
                             validation_summary,
                             k=10,
                             label=['Train', 'Validation'],
                             drop_learners=[],
                             drop_cols=[])
bar_plot_summary_holdout(train_summary,
                         validation_summary,
                         k=10,
                         drop_learners=['S Learner (LR)'],
                         drop_cols=[])
三个指标的柱状图，error of ATE / MSE / KL散度
  
3.3.2 AUUC
 
# Single simulation 生成数据
train_preds, valid_preds = get_synthetic_preds_holdout(simulate_nuisance_and_easy_treatment,
                                                       n=50000,
                                                       valid_size=0.2)
#distribution plot for signle simulation of Training 查看数据分布
distr_plot_single_sim(train_preds, kind='kde', linewidth=2, bw_method=0.5,
                      drop_learners=['S Learner (LR)',' S Learner (XGB)'])
其中来重点看一下get_synthetic_preds_holdout生成之后数据张什么样子，
 因为生成过程比较慢，建议把n调小一些，其中train_preds是，涵盖了，元数据，倾向得分，各类模型的估计结果： 
{'Actuals': array([0.79643095, 0.95486817, 0.76997445, ..., 0.75298567, 0.57918784,
        0.50598952]),
 'generated_data': {'y': array([2.82809532, 0.83627125, 2.96923917, ..., 3.69930311, 2.11858631,
         2.08442807]),
  'X': array([[0.80662423, 0.78623768, 0.55714673, 0.78484245, 0.7062874 ],
         [0.9777165 , 0.93201984, 0.94806163, 0.27111164, 0.91731333],
         [0.94459763, 0.59535128, 0.94004736, 0.78097312, 0.68469675],
         [0.9168486 , 0.58912275, 0.1413644 , 0.40006979, 0.21380953],
         [0.68509846, 0.47327722, 0.476474  , 0.5284623 , 0.7752548 ],
         [0.7854881 , 0.22649094, 0.79246468, 0.23035102, 0.46786142]]),
  'w': array([1, 0, 0, ..., 1, 1, 1]),
  'tau': array([0.79643095, 0.95486817, 0.76997445, ..., 0.75298567, 0.57918784,
         0.50598952]),
  'b': array([2.0569544 , 1.4065011 , 2.49147132, ..., 1.75627446, 1.76858932,
         1.16561357]),
  'e': array([0.9       , 0.27521434, 0.9       , ..., 0.9       , 0.85139268,
         0.53026066])},
 'S Learner (LR)': array([0.67412079, 0.67412079, 0.67412079, ..., 0.67412079, 0.67412079,
        0.67412079]),
 'S Learner (XGB)': array([0.71599877, 1.01481819, 0.45305347, ..., 1.33423424, 0.76635182,
        0.54931992]),
 'T Learner (LR)': array([1.02410448, 1.23423655, 0.98041236, ..., 0.96302446, 0.76681856,
        0.6616715 ]),
 'T Learner (XGB)': array([1.39729691, 2.1762352 , 0.18748665, ..., 1.81221414, 1.25306892,
        0.5382663 ]),
 'X Learner (LR)': array([1.02410448, 1.23423655, 0.98041236, ..., 0.96302446, 0.76681856,
        0.6616715 ]),
 'X Learner (XGB)': array([ 0.86962565,  2.17324671, -0.32896739, ...,  1.60033008,
         1.33514268,  0.41377797]),
 'R Learner (LR)': array([1.12468185, 1.35693603, 1.04248628, ..., 1.00324719, 0.75967883,
        0.57261269]),
 'R Learner (XGB)': array([0.49208722, 1.1459806 , 0.0804432 , ..., 1.74274731, 1.07263219,
        0.48373923])}
# Scatter Plots for a Single Simulation of Training Data
scatter_plot_single_sim(train_preds)
训练集的散点图：
  
计算AUUC： 
# Cumulative Gain AUUC values for a Single Simulation of Training Data
get_synthetic_auuc(train_preds, drop_learners=['S Learner (LR)'])
4 可解释性：Policy Learning Notebook
 
主要参考的是文档：binary_policy_learner_example.ipynb
 另外一个更全面的在：uplift_tree_visualization.ipynb 
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from causalml.optimize import PolicyLearner
from sklearn.tree import plot_tree
from lightgbm import LGBMRegressor
from causalml.inference.meta import BaseXRegressor
# 生成数据 - 干预是二分类的
np.random.seed(1234)
n = 10000
p = 10
X = np.random.normal(size=(n, p))
ee = 1 / (1 + np.exp(X[:, 2]))
tt = 1 / (1 + np.exp(X[:, 0] + X[:, 1])/2) - 0.5
W = np.random.binomial(1, ee, n)
Y = X[:, 2] + W * tt + np.random.normal(size=n)
policy_learner = PolicyLearner(policy_learner=DecisionTreeClassifier(max_depth=2), calibration=True)
policy_learner.fit(X, W, Y)
plt.figure(figsize=(15,7))
plot_tree(policy_learner.model_pi)

 可以得到树的分裂图
 同时也可以后续计算ITE， 
learner_x = BaseXRegressor(LGBMRegressor())
ite_x = learner_x.fit_predict(X=X, treatment=W, y=Y)
pd.DataFrame({
    'DR-DT Optimal': [np.mean((policy_learner.predict(X) + 1) * tt / 2)],
    'True Optimal': [np.mean((np.sign(tt) + 1) * tt / 2)],
    'X Learner': [
        np.mean((np.sign(ite_x) + 1) * tt / 2)
5 NN-Based的模型：dragonnet
 
基于神经网络的方法比较新，这里简单举一个例子——DragonNet。该方法主要工作是将propensity score估计和uplift score估计合并到一个网络实现。 
首先，引述了可用倾向性得分代替X做ATE估计
 
 然后，为了准确预测ATE而非关注到Y预测上，我们应尽可能使用 X中与 T 相关的部分特征。
 其中一种方法就是首先训练一个网络用X预测T，然后移除最后一层并接上Y的预测，则可以实现将X中与T相关的部分提取出来(即倾向性得分  $相关)，并用于Y的预测。 简单参考：$ 【Uplift】建模方法篇 
然后这里文档有两个数据集的应用，都在案例：dragonnet_example.ipynb，这里截取一个来简单看一下： 
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.linear_model import LogisticRegressionCV, LogisticRegression
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error as mse
from scipy.stats import entropy
import warnings
from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor, BaseRRegressor, BaseSRegressor, BaseTRegressor
from causalml.inference.tf import DragonNet
from causalml.match import NearestNeighborMatch, MatchOptimizer, create_table_one
from causalml.propensity import ElasticNetPropensityModel
from causalml.dataset.regression import *
from causalml.metrics import *
import os, sys
%matplotlib inline
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')
sns.set_palette('Paired')
plt.rcParams['figure.figsize'] = (12,8)
数据载入，元数据在官方github上： 
df = pd.read_csv(f'data/ihdp_npci_3.csv', header=None)
cols =  ["treatment", "y_factual", "y_cfactual", "mu0", "mu1"] + [f'x{i}' for i in range(1,26)]
df.columns = cols
X = df.loc[:,'x1':]
treatment = df['treatment']
y = df['y_factual']
tau = df.apply(lambda d: d['y_factual'] - d['y_cfactual'] if d['treatment']==1 
               else d['y_cfactual'] - d['y_factual'], 
               axis=1)
几个基础模型先跑一下然后对比： 
p_model = ElasticNetPropensityModel()
p = p_model.fit_predict(X, treatment)
s_learner = BaseSRegressor(LGBMRegressor())
s_ate = s_learner.estimate_ate(X, treatment, y)[0]
s_ite = s_learner.fit_predict(X, treatment, y)
t_learner = BaseTRegressor(LGBMRegressor())
t_ate = t_learner.estimate_ate(X, treatment, y)[0][0]
t_ite = t_learner.fit_predict(X, treatment, y)
x_learner = BaseXRegressor(LGBMRegressor())
x_ate = x_learner.estimate_ate(X, treatment, y, p)[0][0]
x_ite = x_learner.fit_predict(X, treatment, y, p)
r_learner = BaseRRegressor(LGBMRegressor())
r_ate = r_learner.estimate_ate(X, treatment, y, p)[0][0]
r_ite = r_learner.fit_predict(X, treatment, y, p)
之后是DragonNet模型： 
dragon = DragonNet(neurons_per_layer=200, targeted_reg=True)
dragon_ite = dragon.fit_predict(X, treatment, y, return_components=False)
dragon_ate = dragon_ite.mean()
训练的蛮快： 
Train on 597 samples, validate on 150 samples
Epoch 1/30
597/597 [==============================] - 2s 3ms/sample - loss: 1239.5969 - regression_loss: 561.7198 - binary_classification_loss: 37.7623 - treatment_accuracy: 0.7590 - track_epsilon: 0.0339 - val_loss: 423.3901 - val_regression_loss: 156.0984 - val_binary_classification_loss: 33.2239 - val_treatment_accuracy: 0.7244 - val_track_epsilon: 0.0325
Epoch 2/30
597/597 [==============================] - 0s 71us/sample - loss: 290.8331 - regression_loss: 118.9768 - binary_classification_loss: 30.6743 - treatment_accuracy: 0.8561 - track_epsilon: 0.0319 - val_loss: 236.9888 - val_regression_loss: 84.3624 - val_binary_classification_loss: 34.7734 - val_treatment_accuracy: 0.7244 - val_track_epsilon: 0.0302
Epoch 3/30
597/597 [==============================] - 0s 69us/sample - loss: 232.0476 - regression_loss: 96.7705 - binary_classification_loss: 27.2036 - treatment_accuracy: 0.8529 - track_epsilon: 0.0301 - val_loss: 220.0190 - val_regression_loss: 71.6871 - val_binary_classification_loss: 37.6016 - val_treatment_accuracy: 0.7244 - val_track_epsilon: 0.0307
Epoch 4/30
最后几个模型一起来看一下： 
df_preds = pd.DataFrame([s_ite.ravel(),
                          t_ite.ravel(),
                          x_ite.ravel(),
                          r_ite.ravel(),
                          dragon_ite.ravel(),
                          tau.ravel(),
                          treatment.ravel(),
                          y.ravel()],
                       index=['S','T','X','R','dragonnet','tau','w','y']).T
df_cumgain = get_cumgain(df_preds)
df_result = pd.DataFrame([s_ate, t_ate, x_ate, r_ate, dragon_ate, tau.mean()],
                     index=['S','T','X','R','dragonnet','actual'], columns=['ATE'])
df_result['MAE'] = [mean_absolute_error(t,p) for t,p in zip([s_ite, t_ite, x_ite, r_ite, dragon_ite],
                                                            [tau.values.reshape(-1,1)]*5 )
                ] + [None]
df_result['AUUC'] = auuc_score(df_preds)
plot_gain(df_preds)

 从AUUC来看，几个模型各有好坏 
6 模型重要性 + SHAP
 
主要参考文档：feature_interpretations_example 
里面内容比较多，就截取一些 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.dataset.regression import synthetic_data
from sklearn.linear_model import LinearRegression
import shap
import matplotlib.pyplot as plt
import time
from sklearn.inspection import permutation_importance
from sklearn.model_selection import train_test_split
import os
import warnings
warnings.filterwarnings('ignore')
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'  # for lightgbm to work
%reload_ext autoreload
%autoreload 2
%matplotlib inline
plt.style.use('fivethirtyeight')
n_features = 25
n_samples = 10000
y, X, w, tau, b, e = synthetic_data(mode=1, n=n_samples, p=n_features, sigma=0.5)
w_multi = np.array(['treatment_A' if x==1 else 'control' for x in w])
e_multi = {'treatment_A': e}
feature_names = ['stars', 'tiger', 'merciful', 'quixotic', 'fireman', 'dependent',
                 'shelf', 'touch', 'barbarous', 'clammy', 'playground', 'rain', 'offer',
                 'cute', 'future', 'damp', 'nonchalant', 'change', 'rigid', 'sweltering',
                 'eight', 'wrap', 'lethal', 'adhesive', 'lip']  # specify feature names
model_tau = LGBMRegressor(importance_type='gain')  # specify model for model_tau
S-Learner为代表来看一下： 
base_algo = LGBMRegressor()
# base_algo = XGBRegressor()
# base_algo = RandomForestRegressor()
# base_algo = LinearRegression()
slearner = BaseSRegressor(base_algo, control_name='control')
slearner.estimate_ate(X, w_multi, y)
slearner_tau = slearner.fit_predict(X, w_multi, y)
# 模型的重要性方法一：AUTO
slearner.plot_importance(X=X, 
                         tau=slearner_tau, 
                         normalize=True, 
                         method='auto', 
                         features=feature_names)
# 重要性方法二：permutation方法
slearner.get_importance(X=X, 
                        tau=slearner_tau, 
                        method='permutation', 
                        features=feature_names, 
                        random_state=42)
SHAP值的计算： 
shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)
shap_slearner
# Plot shap values without specifying shap_dict
slearner.plot_shap_values(X=X, tau=slearner_tau, features=feature_names)
                    CausalML，这是一个开放源码的Python包，它提供了一套基于最近研究的机器学习算法的提升建模和因果推理方法。我们将介绍CausalML的主要组成部分:(2) 验证/分析方法(如合成数据生成、AUUC、敏感性分析、可解释性)，(3) 优化方法(如策略优化、价值优化、单元选择)。github：https://github.com/uber/causalml其余两篇开源项目的文章：因果推断笔记——因果图建模之微软开源的EconML（五）因果推断笔记——因果图建模之微软开源的dowhy（一）
项目地址:https://gitcode.com/uber/causalml
CausalML是Uber公司开源的一个机器学习库，它专注于因果推断和预测模型的构建。这个项目结合了统计学、经济学和机器学习领域的最佳实践，为数据科学家提供了一套强大的工具，帮助他们在复杂的业务场景中进行因果效应评估。
CausalML基于Pyth...
这个项目很稳定，可以长期支持。 它可能包含新的实验代码，其API可能会更改。
因果ML：用于ML进行抬升建模和因果推理的Python包
Causal ML是一个Python软件包，它提供了一套基于最近研究的，使用机器学习算法的提升模型和因果推理方法。 它提供了一个标准界面，允许用户从实验或观察数据中估计条件平均治疗效果（CATE）或个体治疗效果（ITE）。 本质上，它为具有观察特征X用户估计了干预T对结果Y的因果影响，而无需对模型形式做出强烈假设。 典型的用例包括
广告系列定位优化：提高广告系列投资回报率的重要手段是将广告定位到在给定的KPI（例如参与度或销售）方面有良好响应的一组客
https://arxiv.org/pdf/2002.11631.pdf
摘要：在本文中，来自 Uber 的研究者推出了针对因果推理和机器学习算法的 Python 实现。近年来，结合因果推理和机器学习的算法早已成为热门话题。CasualML 包通过该领域一系列方法的 Python 实现，力图缩小方法论理论研究与实际应用之间的差距。
本文着重介绍了 CasualML 包的核心概念、应用范围和应用实例。此外，研究者计划将来添加更多 SOTA 增益模型，并探索更多融合机器学习与因果推理的建模方法以
				项目名称：Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML
项目名称：Causal ML：使用机器学习进行提升建模和因果推理的Python包
CausalML是uber的开源项目，用于使用机器学习方法进行提升建模和因果推理方法。它允许用户从实验或观察数据估计条件平均治疗效果(CATE)或个体...
				机器学习车间存储库中的因果建模
这是上机器学习研讨会中因果建模的研讨会资料库。
 Altdeep.ai讲习班是通过在线讲授与讲习班教官进行的一对一会议相结合进行授课的。 访问Altdeep.ai以获取有关研讨会内容和费用的信息。 此回购中的材料是开源的且免费的。
大纲和时间表
NEU学生：如果您想报名参加此课程并且有疑问，请通过与我，而不是通过我的NEU电子邮件与我联系。
完成本课程后，您将能够在顶级技术组织的数据科学和机器学习团队的决策系统中构建因果推理算法。 您将实施一个证明该能力的投资组合项目。 您将使用基于PyTorch的概率深度学习框架来实施家庭作业和班级项目，并且在课程结束时将成为使用此框架的专家。
这门课程适合谁？
 如果您满足以下条件，您将从本课程中获得最大收益：
 您熟悉随机变量，联合概率分布，条件概率分布，贝叶斯规则和贝叶斯统计中的基本思想以及期望。
上午4:00 - 7:00 AM 2021年8月15日
 2021 年 8 月 14 日下午 4:00 - 晚上 7:00
 时间 2021 年 8 月 14 日下午 1:00 - 4:00
实时缩放链接
在会议期间在 KDD 21 虚拟平台内共享。
近年来，无论是学术研究还是行业应用，都越来越多地使用机器学习方法来衡量细粒度的因果效应，并根据这些因果估计设计最佳策略。 和等开源软件包为应用研究人员和行业从业者提供了一个统一的接口，使用各种机器学习方法进行因果推理。 本教程将涵盖的主题包括元学习器的条件处理效果估计器和基于树的算法、模型验证和敏感性分析、优化算法（包括策略精简和成本优化）。 此外，本教程将演示这些算法在行业用例中的生成。
				EconML：用于基于ML的异构处理效果估计的Python包
EconML是一个Python软件包，用于通过机器学习从观察数据中估计异构处理效果。 此软件包是作为Microsoft Research的一部分设计和构建的，目的是将最新的机器学习技术与计量经济学相结合，以使自动化解决复杂的因果推理问题。 EconML的承诺：
 在计量经济学和机器学习的交集中实现文献中的最新技术
保持建模效果异质性的灵活性（通过诸如随机森林，增强，套索和神经网络之类的技术），同时保留对所学模型的因果解释，并经常提供有效的置信区间
使用统一的API
 建立在用于机器学习和数据分析的标准Python软件包的基础上
机器学习的最大希望之一就是在众多领域中自动化决策。 许多数据驱动的个性化决策方案的核心是对异构处理效果的估计：对于具有特定特征集的样本，干预对感兴趣结果的因果关系是什么？ 简而言之，该工具包旨在测量某些治疗变量T对结果变量Y的因果效应，控制一组特征X, W以及该效应如何随X 。 所实施的方法甚至适用于观测（非实验或历史）数据集。 为了使估计结果具有因果关系，有些方法假定没有观察到的混杂因素（即， X,
				参考：https://github.com/uber/causalml
依赖：https://github.com/uber/causalml/blob/master/requirements.txt
安装：sudo pip install causalml
报错：ImportError: cannot import name ‘factorial’
问题解决：sudo pip install statsmodels --upgrade
index-url = http://mirrors.aliyun.com/pypi/simple/
[install]
trusted-host = mirrors.aliyun.com
pip install -r requirements.txt
pip install causalml --default-timeou...
				微软EconML简介：基于机器学习的Heterogeneous Treatment Effects估计
机器学习最大的promise之一是在许多领域实现决策的自动化。
许多数据驱动的决策场景的核心问题是对heterogeneous treatment effects的估计，也即：对于具有特定特征集的样本，干预对输出结果的causal effect是什么？
简言之，这个Python工具包旨在：
测量某些干预变量T对结果变量Y的causal effect
控制一组特征X和W，来衡量causal effect如
				Uber Golang规范是Uber专门为其Golang代码库制定的一套代码编写指南和规范。以下是Uber Golang规范的主要特点：
命名规范：采用驼峰命名法，遵循Go语言的命名约定。使用有意义且描述性强的名称，避免使用缩写。
包和依赖管理：使用Go的标准工具go mod来管理包和依赖。在代码库的根目录下创建go.mod文件，明确定义需要使用的外部依赖。
代码布局：代码文件应按照功能逻辑进行组织，每个文件夹下应包含一个独立的Go模块。避免使用过深的嵌套文件夹结构。
错误处理：在函数签名中使用`error`类型，以便清晰地表示可能出现的错误。避免使用panic来处理错误，而是使用返回错误信息来处理。
并发：使用Go语言提供的并发原语，如goroutine和channel，来编写并发代码。避免使用传统的同步原语，如互斥锁。
测试：为每个包编写相应的测试代码，测试代码应放在与源代码相同的包中，以便方便进行单元测试。使用`go test`来运行测试。
文档：代码应有清晰的注释，包括每个公共函数和方法的文档注释。注释应使用规范的格式，方便生成文档。
性能优化：在编写代码时要考虑性能，并进行必要的性能优化。可以使用Go的性能分析工具来找出性能瓶颈并进行优化。
此外，Uber Golang规范还提供了关于代码风格、错误处理、日志记录、版本管理和代码重用等方面的指导。遵循这些规范可以使代码更易于理解、维护和扩展，并提高代码库的整体质量。