Title: | Causal Inference with Tree-Based Machine Learning Algorithms |
---|---|
Description: | Estimating heterogeneous treatment effects with tree-based machine learning algorithms and visualizing estimated results in flexible and presentation-ready ways. For more information, see Brand, Xu, Koch, and Geraldo (2021) <doi:10.1177/0081175021993503>. Our current package first started as a fork of the 'causalTree' package on 'GitHub' and we greatly appreciate the authors for their extremely useful and free package. |
Authors: | Jiahui Xu [cre, aut], Tanvi Shinkre [aut], Jennie Brand [aut] |
Maintainer: | Jiahui Xu <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.1.19 |
Built: | 2024-10-14 04:00:42 UTC |
Source: | https://github.com/cran/htetree |
intermediate function used to include necessary javascript to visualize tree structures and estimated treatment effect in shiny
bundScript(...)
bundScript(...)
... |
There is no required arguments in this function. But user could manipulate to include different css files. |
No return value. It is used to pass the Javascript to Shiny.
Fit a causalTree
model to get an rpart
object
causalTree( formula, data, weights, treatment, subset, na.action = na.causalTree, split.Rule, split.Honest, HonestSampleSize, split.Bucket, bucketNum = 5, bucketMax = 100, cv.option, cv.Honest, minsize = 2L, x = FALSE, y = TRUE, propensity, control, split.alpha = 0.5, cv.alpha = 0.5, cv.gamma = 0.5, split.gamma = 0.5, cost, ... )
causalTree( formula, data, weights, treatment, subset, na.action = na.causalTree, split.Rule, split.Honest, HonestSampleSize, split.Bucket, bucketNum = 5, bucketMax = 100, cv.option, cv.Honest, minsize = 2L, x = FALSE, y = TRUE, propensity, control, split.alpha = 0.5, cv.alpha = 0.5, cv.gamma = 0.5, split.gamma = 0.5, cost, ... )
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
split.Honest |
boolean option, |
HonestSampleSize |
number of observations anticipated to be used in honest re-estimation after building the tree. This enters the risk function used in both splitting and cross-validation. |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
cv.gamma , split.gamma
|
optional parameters used in evaluating policies. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
CausalTree differs from rpart
function from rpart
package in splitting rules and cross validation methods. Please check
Athey and Imbens, Recursive Partitioning for Heterogeneous Causal
Effects (2016) for more details.
An object of class rpart
. See rpart.object
.
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
honest.causalTree
,
rpart.control
, rpart.object
,
summary.rpart
, rpart.plot
library("htetree") library("rpart") library("rpart.plot") tree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment, split.Rule = "CT", cv.option = "CT", split.Honest = TRUE, cv.Honest = TRUE, split.Bucket = FALSE, xval = 5, cp = 0, minsize = 20, propensity = 0.5) opcp <- tree$cptable[,1][which.min(tree$cptable[,4])] opfit <- prune(tree, opcp) rpart.plot(opfit) fittree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment, split.Rule = "fit", cv.option = "fit", split.Honest = TRUE, cv.Honest = TRUE, split.Bucket = TRUE, bucketNum = 5, bucketMax = 200, xval = 10, cp = 0, minsize = 20, propensity = 0.5) tstatstree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment, split.Rule = "tstats", cv.option = "CT", cv.Honest = TRUE, split.Bucket = TRUE, bucketNum = 10, bucketMax = 200, xval = 5, cp = 0, minsize = 20, propensity = 0.5)
library("htetree") library("rpart") library("rpart.plot") tree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment, split.Rule = "CT", cv.option = "CT", split.Honest = TRUE, cv.Honest = TRUE, split.Bucket = FALSE, xval = 5, cp = 0, minsize = 20, propensity = 0.5) opcp <- tree$cptable[,1][which.min(tree$cptable[,4])] opfit <- prune(tree, opcp) rpart.plot(opfit) fittree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment, split.Rule = "fit", cv.option = "fit", split.Honest = TRUE, cv.Honest = TRUE, split.Bucket = TRUE, bucketNum = 5, bucketMax = 200, xval = 10, cp = 0, minsize = 20, propensity = 0.5) tstatstree <- causalTree(y~ x1 + x2 + x3 + x4, data = simulation.1, treatment = simulation.1$treatment, split.Rule = "tstats", cv.option = "CT", cv.Honest = TRUE, split.Bucket = TRUE, bucketNum = 10, bucketMax = 200, xval = 5, cp = 0, minsize = 20, propensity = 0.5)
causalTree
objectCompute the "branches" to be drawn for an causalTree
object
causalTree.branch(x, y, node, branch)
causalTree.branch(x, y, node, branch)
x |
covariates |
y |
outcome |
node |
node of the fitted tree |
branch |
branch of the fitted tree |
number of branches to be drawn
causalTree
Intermediate function for causalTree
causalTree.control( minsplit = 20L, minbucket = round(minsplit/3), cp = 0, maxcompete = 4L, maxsurrogate = 5L, usesurrogate = 2L, xval = 10L, surrogatestyle = 0L, maxdepth = 30L, ... )
causalTree.control( minsplit = 20L, minbucket = round(minsplit/3), cp = 0, maxcompete = 4L, maxsurrogate = 5L, usesurrogate = 2L, xval = 10L, surrogatestyle = 0L, maxdepth = 30L, ... )
minsplit |
minimum number of splits |
minbucket |
minimum number of bucket |
cp |
default is 0 |
maxcompete |
maximum number of compete |
maxsurrogate |
maximum number of surrogate |
usesurrogate |
initial number of surrogate |
xval |
cross-validation |
surrogatestyle |
the style of surrogate |
maxdepth |
Maximum depth |
... |
arguments to |
parameters used to in causalTree
causalTree
Intermediate function for causalTree
causalTree.matrix(frame)
causalTree.matrix(frame)
frame |
inherited from data.frame |
A covariate matrix used in the causal regression.
causalTree
This routine sets up the callback code for user-written split routines in causalTree
causalTreecallback(mlist, nobs, init)
causalTreecallback(mlist, nobs, init)
mlist |
a list of user written methods |
nobs |
number of observations |
init |
function name |
split method written by users
causalTree
Compute the x-y coordinates for a tree
causalTreeco(tree, parms)
causalTreeco(tree, parms)
tree |
an |
parms |
parms |
the x-y coordinates for a tree
The files for shiny are saved in a temporary directory. The files can be cleared manually using the 'clearTemp()' function, or will automatically be cleared when you close R
clearTemp()
clearTemp()
no return value, to unlink files under the temp folder
causalTree
Run down the built tree and get the final leaf ids for estimation sample
est.causalTree(fit, x)
est.causalTree(fit, x)
fit |
an |
x |
covariates |
Intermediate estimation results for an causalTree
object.
estimate causal Tree
estimate.causalTree( object, data, weights, treatment, na.action = na.causalTree )
estimate.causalTree( object, data, weights, treatment, na.action = na.causalTree )
object |
A tree-structured fit |
data |
New data frame to be used for estimating effects within leaves. |
weights |
optional case weights. |
treatment |
The treatment status of observations in the new dataframe, where 1 represents treated and 0 represents control. |
na.action |
the default action deletes all observations for which
|
When the leaf contains only treated or control cases, the function will trace back to the leaf's parent node recursively until the parent can be used to compute causal effect. Please see Athey and Imbens Machine Learning Methods for Estimating Heterogeneous Causal Effects (2015) for details.
Intermediate estimation results for an causalTree
object
causalTree
Intermediate function for causalTree
formatg(x, digits = getOption("digits"), format = paste0("%.", digits, "g"))
formatg(x, digits = getOption("digits"), format = paste0("%.", digits, "g"))
x |
input training data |
digits |
number of digits to be kept |
format |
format of exported vector |
No return value, called for formatting the exported estimates
get the current work directory and set it as the default directory to save the shiny files temporarily
getDefaultPath()
getDefaultPath()
a temporary file path
Getting the density of distribution in treatment and control groups, which will be displayed in the
getDensities(treatment, outcome)
getDensities(treatment, outcome)
treatment |
A character representing the name of treatment indicator. |
outcome |
A character representing the name of outcome variable. |
vector of corresponding densities for each value of outcome vector
Fit a causalTree
model to get an honest causal tree,
with tree structure built on training sample (including cross-validation)
and leaf estimates taken from estimation sample.
Return an rpart
object.
honest.causalTree( formula, data, weights, treatment, subset, est_data, est_weights, est_treatment, est_subset, na.action = na.causalTree, split.Rule, split.Honest, HonestSampleSize, split.Bucket, bucketNum = 10, bucketMax = 40, cv.option, cv.Honest, minsize = 2L, model = FALSE, x = FALSE, y = TRUE, propensity, control, split.alpha = 0.5, cv.alpha = 0.5, cv.gamma = 0.5, split.gamma = 0.5, cost, ... )
honest.causalTree( formula, data, weights, treatment, subset, est_data, est_weights, est_treatment, est_subset, na.action = na.causalTree, split.Rule, split.Honest, HonestSampleSize, split.Bucket, bucketNum = 10, bucketMax = 40, cv.option, cv.Honest, minsize = 2L, model = FALSE, x = FALSE, y = TRUE, propensity, control, split.alpha = 0.5, cv.alpha = 0.5, cv.gamma = 0.5, split.gamma = 0.5, cost, ... )
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
est_data |
data frame to be used for leaf estimates; the estimation sample. Must contain the variables used in training the tree. |
est_weights |
optional case weights for estimation sample |
est_treatment |
treatment vector for estimation sample. Must be same length as estimation data. A vector indicates the treatment status of the data, 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
est_subset |
optional expression saying that only a subset of the rows of the estimation data should be used in the fit of the re-estimated tree. |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
split.Honest |
boolean option, |
HonestSampleSize |
number of observations anticipated to be used in honest re-estimation after building the tree. This enters the risk function used in both splitting and cross-validation. |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
model |
model frame of |
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
cv.gamma , split.gamma
|
optional parameters used in evaluating policies. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
An object of class rpart
. See rpart.object
.
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
causalTree
,
estimate.causalTree
, rpart.object
,
summary.rpart
, rpart.plot
library("rpart") library("rpart.plot") library("htetree") n <- nrow(simulation.1) trIdx <- which(simulation.1$treatment == 1) conIdx <- which(simulation.1$treatment == 0) train_idx <- c(sample(trIdx, length(trIdx) / 2), sample(conIdx, length(conIdx) / 2)) train_data <- simulation.1[train_idx, ] est_data <- simulation.1[-train_idx, ] honestTree <- honest.causalTree(y ~ x1 + x2 + x3 + x4, data = train_data, treatment = train_data$treatment, est_data = est_data, est_treatment = est_data$treatment, split.Rule = "CT", split.Honest = TRUE, HonestSampleSize = nrow(est_data), split.Bucket = TRUE, cv.option = "CT") opcp <- honestTree$cptable[,1][which.min(honestTree$cptable[,4])] opTree <- prune(honestTree, opcp) rpart.plot(opTree)
library("rpart") library("rpart.plot") library("htetree") n <- nrow(simulation.1) trIdx <- which(simulation.1$treatment == 1) conIdx <- which(simulation.1$treatment == 0) train_idx <- c(sample(trIdx, length(trIdx) / 2), sample(conIdx, length(conIdx) / 2)) train_data <- simulation.1[train_idx, ] est_data <- simulation.1[-train_idx, ] honestTree <- honest.causalTree(y ~ x1 + x2 + x3 + x4, data = train_data, treatment = train_data$treatment, est_data = est_data, est_treatment = est_data$treatment, split.Rule = "CT", split.Honest = TRUE, HonestSampleSize = nrow(est_data), split.Bucket = TRUE, cv.option = "CT") opcp <- honestTree$cptable[,1][which.min(honestTree$cptable[,4])] opTree <- prune(honestTree, opcp) rpart.plot(opTree)
honest re-estimation and change the frame of object using estimation sample
honest.est.causalTree(fit, x, wt, treatment, y)
honest.est.causalTree(fit, x, wt, treatment, y)
fit |
an |
x |
input training data |
wt |
optional weights |
treatment |
treatment variable |
y |
outcome variable |
An object of class rpart
. See rpart.object
.
honest re-estimation and change the frame of object using estimation sample
honest.est.rparttree(fit, x, wt, y)
honest.est.rparttree(fit, x, wt, y)
fit |
an |
x |
input training data |
wt |
optional weights |
y |
outcome variable |
Intermediate estimation results for an honest estimation of
causalTree
.
The recursive partitioning function, for R
honest.rparttree( formula, data, weights, subset, est_data, est_weights, na.action = na.rpart, method, model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ... )
honest.rparttree( formula, data, weights, subset, est_data, est_weights, na.action = na.rpart, method, model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ... )
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
est_data |
data frame to be used for leaf estimates; the estimation sample. Must contain the variables used in training the tree. |
est_weights |
optional case weights for estimation sample |
na.action |
the default action deletes all observations for which
|
method |
one of Alternatively, |
model |
model frame of |
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
parms |
optional parameters for the splitting function. |
control |
a list of options that control details of the
|
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
An object of class rpart
after running an honest recursive
partitioning tree.
.
Estimate heterogeneous treatment effect via causal tree. In each leaf, the treatment effect is the difference of mean outcome in treatment group and control group.
hte_causalTree( outcomevariable, minsize = 20, crossvalidation = 20, data, treatment_indicator, ps_indicator, covariates, negative = FALSE, drawplot = TRUE, varlabel = NULL, maintitle = "Heterogeneous Treatment Effect Estimation", legend.x = 0.08, legend.y = 0.25, check = FALSE, ... )
hte_causalTree( outcomevariable, minsize = 20, crossvalidation = 20, data, treatment_indicator, ps_indicator, covariates, negative = FALSE, drawplot = TRUE, varlabel = NULL, maintitle = "Heterogeneous Treatment Effect Estimation", legend.x = 0.08, legend.y = 0.25, check = FALSE, ... )
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y
|
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
check |
if TRUE, generates 100 trees and outputs most common tree structures and their frequency |
... |
further arguments passed to or from other methods. |
predicted treatment effect and the associated tree
library(rpart) library(htetree) hte_causalTree(outcomevariable="outcome", data=data.frame("confounder"=c(0, 1, 1, 0, 1, 1), "treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7), "outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment", ps_indicator = "prop_score", covariates = "confounder")
library(rpart) library(htetree) hte_causalTree(outcomevariable="outcome", data=data.frame("confounder"=c(0, 1, 1, 0, 1, 1), "treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7), "outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment", ps_indicator = "prop_score", covariates = "confounder")
Estimate heterogeneous treatment effect via random forest. In each leaf, the treatment effect is the difference of mean outcome weighted by inverse propensity scores in treatment group and control group.
hte_forest( outcomevariable, minsize = 20, crossvalidation = 20, data = edurose_mediation_20181126, treatment_indicator = "compcoll25", ps_indicator = "propsc_com25", ps_linear = "propsc_com25lin", covariates = c(linear_terms, ps_indicator), negative = FALSE, drawplot = TRUE, legend.x = 0.08, legend.y = 0.25, gf, ... )
hte_forest( outcomevariable, minsize = 20, crossvalidation = 20, data = edurose_mediation_20181126, treatment_indicator = "compcoll25", ps_indicator = "propsc_com25", ps_linear = "propsc_com25lin", covariates = c(linear_terms, ps_indicator), negative = FALSE, drawplot = TRUE, legend.x = 0.08, legend.y = 0.25, gf, ... )
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
ps_linear |
a character representing name of a column that stores linearized propensity scores. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
legend.x , legend.y
|
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
gf |
a fitted generalized random forest object |
... |
further arguments passed to or from other methods. |
A list with three elements. The first one is the predicted outcome
for each unit. The second is an causalTree
object with the tree split
information. The third is a data.frame
summarizing the prediction
results.
Estimate heterogeneous treatment effect via adjusted causal tree. In each leaf, the treatment effect is the difference of mean outcome weighted by inverse propensity scores in treatment group and control group.
hte_ipw( outcomevariable, minsize = 20, crossvalidation = 20, data, treatment_indicator, ps_indicator, ps_linear = NULL, covariates, negative = FALSE, drawplot = TRUE, varlabel = NULL, maintitle = "Heterogeneous Treatment Effect Estimation", legend.x = 0.08, legend.y = 0.25, check = FALSE, ... )
hte_ipw( outcomevariable, minsize = 20, crossvalidation = 20, data, treatment_indicator, ps_indicator, ps_linear = NULL, covariates, negative = FALSE, drawplot = TRUE, varlabel = NULL, maintitle = "Heterogeneous Treatment Effect Estimation", legend.x = 0.08, legend.y = 0.25, check = FALSE, ... )
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
ps_linear |
a character representing name of a column that stores linearized propensity scores. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y
|
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
check |
if TRUE, generates 100 trees and outputs most common tree structures and their frequency |
... |
further arguments passed to or from other methods. |
predicted treatment effect and the associated tree
library(rpart) library(htetree) hte_ipw(outcomevariable="outcome", data=data.frame("confounder"=c(0, 1, 1, 0, 1, 1), "treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7), "outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment", ps_indicator = "prop_score", covariates = "confounder")
library(rpart) library(htetree) hte_ipw(outcomevariable="outcome", data=data.frame("confounder"=c(0, 1, 1, 0, 1, 1), "treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7), "outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment", ps_indicator = "prop_score", covariates = "confounder")
Estimate heterogeneous treatment effect via adjusted causal tree. In each leaf, the treatment effect estimated from nn matching.
hte_match( outcomevariable, minsize = 20, crossvalidation = 20, data, treatment_indicator, ps_indicator, ps_linear = NULL, covariates, negative = FALSE, drawplot = TRUE, con.num = 1, varlabel = NULL, maintitle = "Heterogeneous Treatment Effect Estimation", legend.x = 0.08, legend.y = 0.25, check = FALSE, ... )
hte_match( outcomevariable, minsize = 20, crossvalidation = 20, data, treatment_indicator, ps_indicator, ps_linear = NULL, covariates, negative = FALSE, drawplot = TRUE, con.num = 1, varlabel = NULL, maintitle = "Heterogeneous Treatment Effect Estimation", legend.x = 0.08, legend.y = 0.25, check = FALSE, ... )
outcomevariable |
a character representing the column name of the outcome variable. |
minsize |
the minimum number of observations in each leaf. The default is set as 20. |
crossvalidation |
number of cross validations. The default is set as 20. |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
ps_indicator |
a character representing the column name of the propensity score. |
ps_linear |
a character representing name of a column that stores linearized propensity scores. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
drawplot |
a logical value indicating whether to plot the model as part of the output. The default is set as TRUE. |
con.num |
a number indicating the number of units from control groups to be used in matching. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y
|
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
check |
if TRUE, generates 100 trees and outputs most common tree structures and their frequency |
... |
further arguments passed to or from other methods. |
predicted treatment effect and the associated tree
library(rpart) library(htetree) hte_match(outcomevariable="outcome", data=data.frame("x1"=c(0, 1, 1, 0, 1, 1),"x2"=c(3, 2, 1, 5, 7, 1), "treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7), "outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment", ps_indicator = "prop_score", covariates = c("x1","x2"))
library(rpart) library(htetree) hte_match(outcomevariable="outcome", data=data.frame("x1"=c(0, 1, 1, 0, 1, 1),"x2"=c(3, 2, 1, 5, 7, 1), "treatment"=c(0,0,0,1,1,1), "prop_score"=c(0.4, 0.4, 0.5, 0.6, 0.6, 0.7), "outcome"=c(1, 2, 2, 1, 4, 4)), treatment_indicator = "treatment", ps_indicator = "prop_score", covariates = c("x1","x2"))
The function hte_plot
takes a model created by causal tree, as
well as the adjusted version, and
plots the distribution of the outcome variable in treated
and control groups in each leaf of the tree.
This visualization aims to show how the predicted
treatment effect changes with each split in the tree.
hte_plot( model, data, treatment_indicator = NULL, outcomevariable, propensity_score, plot.title = "Visualization of the Tree" )
hte_plot( model, data, treatment_indicator = NULL, outcomevariable, propensity_score, plot.title = "Visualization of the Tree" )
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
outcomevariable |
a character representing the column name of the outcome variable. |
propensity_score |
a character representing the column name of the propensity score. |
plot.title |
character representing the main title of the plot. |
no return value
The function hte_plot_line
takes a model created by
causal tree, as well as the adjusted version, and plots the
different least squares models used to estimate heterogeneous
treatment effects(HTE) at each node. At each node, this
visualization aims to show how the estimated treatment effect
differs when using ordinary least squares and weighted least
squares methods. The weighted least squares method in this
package uses
inverse propensity scores as weights, in order to reduce
bias due to confounding variables.
hte_plot_line( model, data, treatment_indicator = NULL, outcomevariable, propensity_score, plot.title = "Visualization of the Tree", gamma = 0, lambda = 0, ... )
hte_plot_line( model, data, treatment_indicator = NULL, outcomevariable, propensity_score, plot.title = "Visualization of the Tree", gamma = 0, lambda = 0, ... )
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
outcomevariable |
a character representing the column name of the outcome variable. |
propensity_score |
a character representing the column name of the propensity score. |
plot.title |
character representing the main title of the plot. |
gamma , lambda
|
numbers indicating the bias level used in sensitivity analysis |
... |
further arguments passed to or from other methods. |
No return value, used for plotting the estimated results with lines.
causalTree
Intermediate function for causalTree
htetree.anova(y, offset, wt)
htetree.anova(y, offset, wt)
y |
outcome variable |
offset |
this can be used to specify an a priori known
component
to be included in the linear predictor during fitting. This should be
|
wt |
optional weights |
No return value.
Each primary split is credited with the value of splits$improve Each surrogate split gets split$adj times the primary split's value
importance(fit)
importance(fit)
fit |
a fitted |
same as the importance
function in rpart.
Build a random causal forest by fitting a user selected number of
causalTree
models to get an ensemble of rpart
objects.
init.causalForest( formula, data, treatment, weights = FALSE, cost = FALSE, num.trees, ncov_sample ) ## S3 method for class 'causalForest' predict(object, newdata, predict.all = FALSE, type = "vector", ...) causalForest( formula, data, treatment, na.action = na.causalTree, split.Rule = "CT", double.Sample = TRUE, split.Honest = TRUE, split.Bucket = FALSE, bucketNum = 5, bucketMax = 100, cv.option = "CT", cv.Honest = TRUE, minsize = 2L, propensity, control, split.alpha = 0.5, cv.alpha = 0.5, sample.size.total = floor(nrow(data)/10), sample.size.train.frac = 0.5, mtry = ceiling(ncol(data)/3), nodesize = 1, num.trees = nrow(data), cost = FALSE, weights = FALSE, ncolx, ncov_sample )
init.causalForest( formula, data, treatment, weights = FALSE, cost = FALSE, num.trees, ncov_sample ) ## S3 method for class 'causalForest' predict(object, newdata, predict.all = FALSE, type = "vector", ...) causalForest( formula, data, treatment, na.action = na.causalTree, split.Rule = "CT", double.Sample = TRUE, split.Honest = TRUE, split.Bucket = FALSE, bucketNum = 5, bucketMax = 100, cv.option = "CT", cv.Honest = TRUE, minsize = 2L, propensity, control, split.alpha = 0.5, cv.alpha = 0.5, sample.size.total = floor(nrow(data)/10), sample.size.train.frac = 0.5, mtry = ceiling(ncol(data)/3), nodesize = 1, num.trees = nrow(data), cost = FALSE, weights = FALSE, ncolx, ncov_sample )
formula |
a formula, with a response and features but no
interaction terms. If this a a data frome, that is taken as the model frame
(see |
data |
an optional data frame that includes the variables named in the formula. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
weights |
optional case weights. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
num.trees |
Number of trees to be built in the causal forest |
ncov_sample |
Number of covariates randomly sampled to build each tree in the forest |
object |
a |
newdata |
new data to predict |
predict.all |
If TRUE, return predicted individual effect for each observations. Otherwise, return the average effect. |
type |
the type of returned object |
... |
arguments to |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
double.Sample |
boolean option, |
split.Honest |
boolean option, |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
sample.size.total |
Sample size used to build each tree in the forest (sampled randomly with replacement). |
sample.size.train.frac |
Fraction of the sample size used for building each tree (training). For eexample, if the sample.size.total is 1000 and frac =0.5 then, 500 samples will be used to build the tree and the other 500 samples will be used the evaluate the tree. |
mtry |
Number of data features used to build a tree (This variable is not used presently). |
nodesize |
Minimum number of observations for treated and control cases in one leaf node |
ncolx |
Total number of covariates |
CausalForest builds an ensemble of CausalTrees (See Athey and Imbens,
Recursive Partitioning for Heterogeneous Causal
Effects (2016)), by repeated random sampling of the data with replacement.
Further, each tree is built using a randomly sampled subset of all available
covariates. A causal forest object is a list of trees. To predict, call R's
predict function with new test data and the causalForest object (estimated
on the training data) obtained after calling the causalForest function.
During the prediction phase, the average value over all tree predictions
is returned as the final prediction by default.
To return the predictions of each tree in the forest for each test
observation, set the flag predict.all=TRUE
CausalTree differs from rpart
function from rpart package in
splitting rules and cross validation methods. Please check Athey
and Imbens, Recursive Partitioning for Heterogeneous Causal
Effects (2016) and Stefan Wager and Susan Athey, Estimation and
Inference of Heterogeneous Treatment Effects using Random Forests
for more details.
An object of class rpart
. See rpart.object
.
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
Wager,S and Athey, S (2015) Estimation and Inference of Heterogeneous Treatment Effects using Random Forests http://arxiv.org/abs/1510.04342
causalTree
honest.causalTree
,
rpart.control
, rpart.object
,
summary.rpart
, rpart.plot
library(rpart) library("htetree") cf <- causalForest(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10, data=simulation.1, treatment=simulation.1$treatment, split.Rule="CT", split.Honest=TRUE, split.Bucket=FALSE, bucketNum = 5, bucketMax = 100, cv.option="CT", cv.Honest=TRUE, minsize = 2L, split.alpha = 0.5, cv.alpha = 0.5, sample.size.total = floor(nrow(simulation.1) / 2), sample.size.train.frac = .5, mtry = ceiling(ncol(simulation.1)/3), nodesize = 3, num.trees= 5, ncolx=10,ncov_sample=3) cfpredtest <- predict.causalForest(cf, newdata=simulation.1[1:100,], type="vector")
library(rpart) library("htetree") cf <- causalForest(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10, data=simulation.1, treatment=simulation.1$treatment, split.Rule="CT", split.Honest=TRUE, split.Bucket=FALSE, bucketNum = 5, bucketMax = 100, cv.option="CT", cv.Honest=TRUE, minsize = 2L, split.alpha = 0.5, cv.alpha = 0.5, sample.size.total = floor(nrow(simulation.1) / 2), sample.size.train.frac = .5, mtry = ceiling(ncol(simulation.1)/3), nodesize = 3, num.trees= 5, ncolx=10,ncov_sample=3) cfpredtest <- predict.causalForest(cf, newdata=simulation.1[1:100,], type="vector")
An intermediate function used for plotting
makeplots( negative, opfit. = opfit, trainset, covariates, outcomevariable, data. = data, hte_effect_setup, varlabel, maintitle, legend.x = 0.8, legend.y = 0.25, ... )
makeplots( negative, opfit. = opfit, trainset, covariates, outcomevariable, data. = data, hte_effect_setup, varlabel, maintitle, legend.x = 0.8, legend.y = 0.25, ... )
negative |
a logical value indicating whether we expect the treatment effect to be negative. The default is set as FALSE. |
opfit. |
tree structure generated from causal tree algorithm. |
trainset |
a data frame only containing the variables used in the model and missings values are listwise deleted. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
outcomevariable |
a character representing the column name of the outcome variable. |
data. |
a data frame containing the variables in the model. |
hte_effect_setup |
a empty list to store the adjusted treatment effect. |
varlabel |
a named vector containing variable labels. |
maintitle |
a character string indicating the main title displayed when plotting the tree and results. The default is set as "Heterogeneous Treatment Effect Estimation". |
legend.x , legend.y
|
x and y coordinate to position the legend. The default is set as (0.08, 0.25). |
... |
further arguments passed to or from other methods. |
A plot visualizing the tree and estimated treatment effect in each node.
This intermediate function is used to adjust the heterogeneous treatment effect estimated in each leaf with NN matching.
matchinleaves( trainset = match_data, covariates = covariates, outcomevariable = outcomevariable, hte_effect_setup = hte_effect_setup, treatment_indicator, con.num = 1, ... )
matchinleaves( trainset = match_data, covariates = covariates, outcomevariable = outcomevariable, hte_effect_setup = hte_effect_setup, treatment_indicator, con.num = 1, ... )
trainset |
a data frame only containing the variables used in the model and missings values are listwise deleted. |
covariates |
a vector of column names of all covariates (linear terms andpropensity score). |
outcomevariable |
a character representing the column name of the outcome variable. |
hte_effect_setup |
a empty list to store the adjusted treatment effect. |
treatment_indicator |
a character representing the column name of the treatment indicator. |
con.num |
a number indicating the number of units from control groups to be used in matching |
... |
further arguments passed to or from other methods. |
A list for summarizing the results after matching.
causalTree
get model frame of causalTree, same as rpart
## S3 method for class 'causalTree' model.frame(formula, ...)
## S3 method for class 'causalTree' model.frame(formula, ...)
formula |
a formula, with a response but no interaction terms. If this is a data frame, it is taken as the model frame (see model.frame). |
... |
arguments to |
a model frame for causalTree
.
causalTree
requirement when missing values are included in sample.
na.causalTree(x)
na.causalTree(x)
x |
covariates |
No return value, used for handling missing values when thy are included in sample.
hte_plot_line
Plots the different least squares models used to estimate heterogeneous treatment effects(HTE) at each node. At each node, this visualization aims to show how the estimated treatment effect differs when using ordinary least squares and weighted least squares methods. The weighted least squares method in this package uses inverse propensity scores as weights, in order to reduce bias due to confounding variables.
plotOutcomes( treatment, outcome, propscores, confInt = TRUE, colbyWt = FALSE, ylab = "", xlab = "", title = "", gamma = 0, lambda = 0, ... )
plotOutcomes( treatment, outcome, propscores, confInt = TRUE, colbyWt = FALSE, ylab = "", xlab = "", title = "", gamma = 0, lambda = 0, ... )
treatment |
a character representing the column name for the treatment variable in the causal setup |
outcome |
a character representing the column name of the outcome variable. |
propscores |
a character representing the column name of the propensity score. |
confInt |
a logical value indicating whether adding the 95
confidence interval. The default is set as |
colbyWt |
a logical value indicating whether the points are are colored according to inverse propensity scores. The default is set as FALSE. |
xlab , ylab , title
|
Characters representing the name for x axis, y axis, and main title for each node. |
gamma , lambda
|
numbers indicating the bias level used in sensitivity analysis |
... |
further arguments passed to or from other methods. |
A summary table after adjusting the estimates with inverse probability weighting (ipw).
Visualize Causal Tree and Treatment Effects via Shiny
runDynamic( model, data, outcomevariable, treatment_indicator, propensity_score = "" )
runDynamic( model, data, outcomevariable, treatment_indicator, propensity_score = "" )
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
outcomevariable |
a character representing the column name of the outcome variable. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
propensity_score |
a character representing the column name of the propensity score. |
a Shiny page.
Save Javascript Embedded in Shiny App
saveBCSS(filePath)
saveBCSS(filePath)
filePath |
a character string representing the path name to save the files temporarily. |
No return value. It is used to save necessary files temporarily to run Shiny App.
This function is to save files necessary to run Shiny app to visualize causal tree and the estimated heterogeneous treatment effects in an interactive way.
saveFiles( model, data, outcomevariable, treatment_indicator, propensity_score = "", filePath = "" )
saveFiles( model, data, outcomevariable, treatment_indicator, propensity_score = "", filePath = "" )
model |
a tree model constructed by |
data |
a data frame containing the variables in the model. |
outcomevariable |
a character representing the column name of the outcome variable. |
treatment_indicator |
a character representing the column name for the treatment variable in the causal setup. |
propensity_score |
a character representing the column name of the propensity score. |
filePath |
a character string representing the path name to save the files temporarily. |
No return value. It is used to save necessary files temporarily to run Shiny App.
Save CSS File Embedded in Shiny App
saveGCSS(filePath)
saveGCSS(filePath)
filePath |
a character string representing the path name to save the files temporarily. |
No return value. It is used to save necessary files temporarily to run Shiny App.
Save HTML Index Embedded in Shiny App
saveInd(filePath)
saveInd(filePath)
filePath |
a character string representing the path name to save the files temporarily. |
No return value. It is used to save necessary files temporarily to run Shiny App.
Save Shiny Server Temporarily
saveServ(filePath)
saveServ(filePath)
filePath |
a character string representing the path name to save the files temporarily. |
No return value. It is used to save necessary files temporarily to run Shiny App.
Save Shiny UI Temporarily
saveUI(filePath)
saveUI(filePath)
filePath |
a character string representing the path name to save the files temporarily. |
No return value. It is used to save necessary files temporarily to run Shiny App.
A simulated dataset inherited from causalTree
package
simulation.1
simulation.1
## 'simulation.1' A data frame with 500 observations on the following 12 variables.
x1
a numeric vector
x2
a numeric vector
x3
a numeric vector
x4
a numeric vector
x5
a numeric vector
x6
a numeric vector
x7
a numeric vector
x8
a numeric vector
x9
a numeric vector
x10
a numeric vector
y
a numeric vector
treatment
a numeric vector