Causal Inference

Causal inference has constituted a substantial portion of my research profile since I joined RAND. Early in my RAND career, I built on knowledge of survey methodologies to develop a new approach for synthetic control procedures. Specifically, in the estimation of the effect of a neighborhood-based drug market intervention, I extended the existing synthetic control toolkit to capitalize on data that are disaggregated spatially and employed survey methodologies to calculate weights and uncertainty (Robbins et al., 2017). This procedure was made available in the R package microsynth (Robbins & Davenport, 2021; Robbins & Davenport, 2025). This method and software has been used in several collaborative projects where I have served as a contributing author (Saunders et al., 2017; Davenport et al., 2021; Neil et al., 2025; Ghosh-Dastidar et al., 2026).

I was also awarded a competitive NIH grant through the National Institute on Aging (R21AG058123, $380,529), the goal of which was to develop statistical procedures that could be used to assess the long-term effects of a program or intervention in the short term. This work resulted in a data fusion method that uses imputation to combine a dataset indicating the effects of the program on short-term outcomes with a separate dataset that could establish the link between short- and long-term outcomes (Robbins et al., 2024). This technique was used to evaluate the effect of the Oregon Health Insurance Experiment on long-term mortality (Robbins et al., 2024) and individual-level economic indicators such as housing equity. Further theoretical work was also made possible by this grant (Robbins & Burgette, 2025).

I have also produced other research on statistical methods for causal inference, including the use of kernel densities to estimate inverse probability weights for longitudinal evaluations of environmental exposures at the neighborhood level (Robbins et al., 2020), weighting through entropy balancing to evaluate continuous treatments (Vegetabile et al., 2021), and techniques to combine propensity score weights with sampling, non-response, and/or attrition weights (McCaffrey et al., 2024).

Collaborators:

Sebastian Bauhoff, Inter-American Development Bank
Lane Burgette, RAND Corporation
Steve Davenport, Uber
Esther Friedman, University of Michigan
Beth Ann Griffin, RAND Corporation
Beau Kilmer, RAND Corporation
Dan McCaffrey, ETS
Roland Neil, RAND Corporation
Jessica Saunders, The Council of State Governments Justice Center
Regina Shih, Emory University
Brian Vegetabile, LinkedIn

References

Journal Articles

A framework for synthetic control methods with high dimensional, micro-level data: Evaluating a neighborhood-specific crime intervention

M. W. Robbins, J. Saunders, and B. Kilmer

Journal of the American Statistical Association, 2017

Abs HTML

The synthetic control method is an increasingly popular tool for analysis of program efficacy. Here, it is applied to a neighborhood-specific crime intervention in Roanoke, VA, and several novel contributions are made to the synthetic control toolkit. We examine high-dimensional data at a granular level (the treated area has several cases, a large number of untreated comparison cases, and multiple outcome measures). Calibration is used to develop weights that exactly match the synthetic control to the treated region across several outcomes and time periods. Further, we illustrate the importance of adjusting the estimated effect of treatment for the design effect implicit within the weights. A permutation procedure is proposed wherein countless placebo areas can be constructed, enabling estimation of p-values under a robust set of assumptions. An omnibus statistic is introduced that is used to jointly test for the presence of an intervention effect across multiple outcomes and post-intervention time periods. Analyses indicate that the Roanoke crime intervention did decrease crime levels, but the estimated effect of the intervention is not as statistically significant as it would have been had less rigorous approaches been used. Supplementary materials for this article are available online.
microsynth: Synthetic control methods with micro- and meso-level data in R

M. W. Robbins and S. Davenport

Journal of Statistical Software, 2021

Abs HTML

The R package microsynth has been developed for implementation of the synthetic control methodology for comparative case studies involving micro- or meso-level data. The methodology implemented within microsynth is designed to assess the efficacy of a treatment or intervention within a well-defined geographic region that is itself a composite of several smaller regions (where data are available at the more granular level for comparison regions as well). The effect of the intervention on one or more time-varying outcomes is evaluated by determining a synthetic control region that resembles the treatment region across pre-intervention values of the outcome(s) and time-invariant covariates and that is a weighted composite of many untreated comparison regions. The microsynth procedure includes functionality that enables its user to (1) calculate weights for synthetic control, (2) tabulate results for statistical inferences, and (3) create time series plots of outcomes for treatment and synthetic control. In this article, microsynth is described in detail and its application is illustrated using data from a drug market intervention in Seattle, WA.
Implementing the Drug Market Intervention across multiple sites

J. Saunders, M. Robbins, and A. Ober

Criminology & Public Policy, 2017

Abs HTML

In 2012, the editors of CPP published an exchange about the Drug Market Intervention (DMI) in High Point, NC, concluding that it may be a promising approach to crime control but questioning whether it could be implemented across different settings. In this effectiveness study, we followed a cohort of seven sites that participated in a Bureau of Justice Assistance–sponsored DMI training to assess implementation and outcomes. Three sites were not able to implement, and implementation fidelity varied across the four sites that did implement. Of the four sites that held at least one call-in, only one was successful at reducing overall and drug crime (by 28% and 56%, respectively). This works out to an implementation rate of 57% with an average overall crime reduction of 16% (treatment-on-the-treated) or 4% (intent-to-treat). The results of this study demonstrate the importance of replication and the careful study of implementation fidelity prior to wide dissemination.
Associations between a zero tolerance BAC law and traffic crashes and fatalities: Insights from a novel synthetic control method

S. Davenport, M. Robbins, M. Cerda, A. Riveral, and B. Kilmer

Addiction, 2021

Abs HTML

Background and aims
Debates regarding lowering the blood alcohol concentration (BAC) limit for drivers are intensifying in the United States and other countries, and the World Health Organization recommends that the limit for adults should be 0.05%. In January 2016, Uruguay implemented a law setting a zero BAC limit for all drivers. This study aimed to assess the effect of this policy on the frequency of moderate/severe injury and fatal traffic crashes.

Design
A quasi-experimental study in which a synthetic control model was used with controls consisting of local areas in Chile as the counterfactual for outcomes in Uruguay, matched across population counts and pre-intervention period outcomes. Sensitivity analyses were also conducted.

Setting
Uruguay and Chile.

Cases
Panel data with crash counts by outcome per locality-month (2013–2017).

Intervention and comparator
A zero blood alcohol concentration law implemented on 9 January 2016 in Uruguay, alongside a continued 0.03 g/dl BAC threshold in Chile.

Measurements
Per-capita moderate/severe injury (i.e. moderate or severe), severe injury and fatal crashes (2013–2017).

Findings
Our base synthetic control model results suggested a reduction in fatal crashes at 12 months [20.9%; P-value = 0.018, 95% confidence interval (CI) = −0.340, −0.061]. Moderate/severe injury crashes did not decrease significantly (10.2%, P = 0.312, 95% CI = −0.282, 0.075). The estimated effect at 24 months was smaller and with larger confidence intervals for fatal crashes (14%; P = 0.048, 95% CI = −0.246, −0.026) and largely unchanged for moderate/severe injury crashes (−9.4%, P = 0.302, 95% CI = −0.248, 0.058). Difference-in-differences analyses yielded similar results. As a sensitivity test, a synthetic control model relying on an inferior treatment–control match pre-intervention (measured by mean squared error) yielded similar-sized differences that were not statistically significant.

Conclusions
Implementation of a law setting a zero blood alcohol concentration threshold for all drivers in Uruguay appears to have resulted in a reduction in fatal crashes during the following 12 and 24 months.
The impact of drug possession decriminalization on arrests: A race-specific synthetic control analysis of Oregon’s Measure 110

R. Neil, B. Ghosh Dastidar, B. Kilmer, M. W. Robbins, and K. Warren

Journal of Quantitative Criminology, 2025

Abs HTML

Objectives
Racial disparities in arrests are a major concern, particularly when it comes to drug enforcement. In 2021, Oregon decriminalized the possession of controlled substances as part of Measure 110 (M-110), an unprecedented policy change in the United States. We estimate how M-110 affected five types of arrests, overall and by race.

Methods
National Incident-Based Reporting System data covering 3,642 police agencies from 43 states for 2018–2023 are combined with 2020 Census data. We extend a synthetic control methodology developed for micro-level data to test whether policy effects differ across groups and whether policies affect disparities, using permutation inference to quantify uncertainty.

Results
M-110 reduced drug possession arrest rates in Oregon for the overall population (67.8%) and for the three racial groups we focus on: Black (75.6%), Hispanic (77.5%); and White (66.2%), with the reduction being statistically significantly larger for Hispanic and Black than White individuals. M-110 reduced disorder arrest rates by 30.9% for Black individuals, which is statistically significantly different from zero and the White estimate. Black-White rate differences in drug possession arrests fell by 79.5% and in disorder arrests by 41.7%. In general, M-110 did not affect arrest rates for violent, property, or drug trafficking offenses.

Conclusions
M-110 reduced drug possession arrests while reducing Black-White rate differences. M-110 led to a decrease in disorder arrests for Black individuals, suggesting police did not substitute one arrest type for another for this population. Our method offers a new approach for examining heterogeneous policy effects and how policies affect disparities.
Medicaid Home- and Community-Based Services Long-Term Care Expenditures: Evaluation of the Balancing Incentive Program

B. Ghosh-Dastidar, M. W. Robbins, E. M. Friedman, N. Qureshi, and R. A. Shih

Medical Care, 2026

Abs HTML

Objective. The Balancing Incentive Program (BIP), legislated in the 2010 Affordable Care Act, offered states financial incentives to increase access to Medicaid home- and community-based services (HCBS). Despite the major infrastructure changes required by BIP, no evaluation to date has quantified the increase in spending attributable to BIP which is of concern to Medicaid HCBS policymakers, providers, and consumers. This is the first causal estimate of BIP’s effects including timing of implementation in each state, compared with a counterfactual.

Design. Using state-level expenditure data, we estimated change in HCBS spending as percentage of LTSS spending in 17 BIP participant states compared with a counterfactual or synthetic control calculated as a weighted average of the outcome in 17 BIP eligible, non-participant states. Synthetic control weights were estimated using pre-BIP characteristics. To assess how BIP effects evolved over time, we estimated cumulative change in the outcome in multiple post-BIP years (2013, 2016 and 2019).

Results. Our primary analysis indicates that cumulatively from FY 2013-2019, BIP states increased their HCBS spending as percentage of LTSS spending by an average of 5.2 percentage points (95% CI: 0.0, 9.8), compared with the synthetic control.

Implications. Although many state-run programs have sought to increase HCBS access, our study’s causal estimate of BIP effects in 17 states, compared to 17 states that did not, represents a far more substantial growth than findings of prior studies.
Data fusion for predicting long-term program impacts

M. W. Robbins, S. Bauhoff, and L. Burgette

Statistics in Medicine, 2024

Abs HTML

Policymakers often require information on programs’ long-term impacts that is not available when decisions are made. For example, while rigorous evidence from the Oregon Health Insurance Experiment (OHIE) shows that having health insurance influences short-term health and financial measures, the impact on long-term outcomes, such as mortality, will not be known for many years following the program’s implementation. We demonstrate how data fusion methods may be used address the problem of missing final outcomes and predict long-run impacts of interventions before the requisite data are available. We implement this method by concatenating data on an intervention (such as the OHIE) with auxiliary long-term data and then imputing missing long-term outcomes using short-term surrogate outcomes while approximating uncertainty with replication methods. We use simulations to examine the performance of the methodology and apply the method in a case study. Specifically, we fuse data on the OHIE with data from the National Longitudinal Mortality Study and estimate that being eligible to apply for subsidized health insurance will lead to a statistically significant improvement in long-term mortality.
Resampling methods with multiply imputed data

M. W. Robbins and L. Burgette

Biometrika, 2025

Abs HTML

Resampling techniques have become increasingly popular for estimation of uncertainty. However, data are often fraught with missing values that are commonly imputed to facilitate analysis. This article addresses the issue of using resampling methods such as a jackknife or bootstrap in conjunction with imputations that have been sampled stochastically, in the vein of multiple imputation. We derive the theory needed to illustrate two key points regarding the use of resampling methods in lieu of traditional combining rules. First, imputations should be independently generated multiple times within each replicate group of a jackknife or bootstrap. Second, the number of multiply imputed datasets per replicate group must dramatically exceed the number of replicate groups for a jackknife; however, this is not the case in a bootstrap approach. We also discuss bias-adjusted analogues of the jackknife and bootstrap that are argued to require fewer imputed datasets. A simulation study is provided to support these theoretical conclusions.
Robust estimation of the effect of neighborhood socioeconomic status on cognitive function

M. W. Robbins, B. A. Griffin, R. A. Shih, and M. E. Slaughter

Statistics in Medicine, 2020

Abs HTML

The fundamental difficulty of establishing causal relationships between an exposure and an outcome in observational data involves disentangling causality from confounding factors. This problem underlies much of neighborhoods research, which abounds with studies that consider associations between neighborhood characteristics and health outcomes in longitudinal data. Such analyses are confounded by selection issues; individuals with above average health outcomes (or associated characteristics) may self-select into advantaged neighborhoods. Techniques commonly used to assess causal inferences in observational longitudinal data, such as inverse probability of treatment weighting (IPTW), may be inappropriate in neighborhoods data due to unique characteristics of such data. We advance the IPTW toolkit by introducing a procedure based on a multivariate kernel density function which is more appropriate for neighborhoods data. The proposed weighting method is applied in conjunction with a marginal structural model. Our empirical analyses use longitudinal data from the Health and Retirement Study; our exposure of interest is an index of neighborhood socioeconomic status (NSES), and we examine its influence on cognitive function. Our findings illustrate the importance of the choice of method for IPTW—the comparison weighting methods provide poor balance across the set of covariates (which is not the case for our preferred procedure) and yield misleading results when applied in the outcomes models. The utility of the multivariate kernel is also validated via simulation. In addition, our findings emphasize the importance of IPTW—controlling for covariates within a regression without IPTW indicates that NSES affects cognition, whereas IPTW-weighted models fail to show a statistically significant effect.
Nonparametric estimation of population average dose-response curves using entropy balancing weights for continuous exposures

B. G. Vegetabile, B. A. Griffin, D. Coffman, M. Cefalu, M. W. Robbins, and 1 more author

Health Services and Outcomes Research Methodology, 2021

Abs HTML

Weighted estimators are commonly used for estimating exposure effects in observational settings to establish causal relations. These estimators have a long history of development when the exposure of interest is binary and where the weights are typically functions of an estimated propensity score. Recent developments in optimization-based estimators for constructing weights in binary exposure settings, such as those based on entropy balancing, have shown more promise in estimating treatment effects than those methods that focus on the direct estimation of the propensity score using likelihood-based methods. This paper explores recent developments of entropy balancing methods to continuous exposure settings and the estimation of population dose-response curves using nonparametric estimation combined with entropy balancing weights, focusing on factors that would be important to applied researchers in medical or health services research. The methods developed here are applied to data from a study assessing the effect of non-randomized components of an evidence-based substance use treatment program on emotional and substance use clinical outcomes.
Estimating generalized propensity scores with survey and attrition weighted data

D. McCaffrey, B. A. Griffin, M. Robbins, Y. Chakraborti, D. Coffman, and 1 more author

Statistics in Medicine, 2024

Abs HTML

Prior work in causal inference has shown that using survey sampling weights in the propensity score estimation stage and the outcome model stage for binary treatments can result in a more robust estimator of the effect of the binary treatment being analyzed. However, to date, extending this work to continuous treatments and exposures has not been explored nor has consideration been given for how to handle attrition weights in the propensity score model. Nonetheless, generalized propensity score (GPS) analyses are being used for estimating continuous treatment effects on outcomes when researchers have observational data, and those data sets often have survey or attrition weights that need to be accounted for in the analysis. Here, we extend prior work and show with analytic results that using survey sampling or attrition weights in the GPS estimation stage and the outcome model stage for continuous treatments can result in a more robust estimator than one that does not. Simulation study results show that, although using weights in both estimation stages is sufficient for robust estimation, it is not necessary and unbiased estimation is possible in some cases under various approaches to using weights in estimation. Analysts do not know if the conditions of our simulation studies hold, so use of weights in both estimation stages might provide insurance for reducing potential bias. We discuss the implications of our results in the context of an empirical example.

Manuals

microsynth: Synthetic Control Methods with Micro- And Meso-Level Data

M. Robbins and S. Davenport

2025

R package version 2.0.51

HTML