My employment at RAND has afforded me the opportunity to perform collaborative work with multidisciplinary teams across a variety of scientific fields. I have collaborated on a substantial amount of biomedical work, much of which has been published (Robbins & Setodji, 2014; Roblin et al., 2017; Conlon et al., 2016; Liu et al., 2017; Nuckols et al., 2017; Nuckols et al., 2018; Shetty et al., 2020; Shetty et al., 2020; Aragaki et al., 2020; Shetty et al., 2021; Shetty et al., 2021; Nuckols et al., 2023). I also collaborated across RAND’s Federally Funded Research and Development Centers—this work is often precluded from public release, although some of it has appeared in scientific journals (Robbins et al., 2025; Robbins, 2025). Furthermore, I co-led a sequence of projects sponsored by the Transportation Security Administration to investigate statistical and practical aspects of covert testing at airport screening locations—of this, one report was cleared for public release (Chang et al., 2025). My collaborative work, which spans many scientific disciplines and statistical sub-disciplines, resulted in dozens of other RAND reports which are outlined on my CV. It is also worth noting that much of my work has involved graduate students from the RAND School of Public Policy as lead author (Davenport et al., 2021; Griswold et al., 2025) or as contributing author (Robbins et al., 2023; Robbins & Davenport, 2021).
Collaborators:
- Joseph Chang, RAND Corporation
- Cheryl Damberg, RAND Corporation
- Steve Davenport, Uber
- Michael Dworsky, RAND Corporation
- Max Griswold, King County DCHS
- Harry Liu, Baylor Scott & White Health
- Teryl Nuckols, Cedars-Sinai Medical Center
- Michael Pollard, RAND Corporation
- Douglas Roblin, Kaiser Permanente
- Claude Setodji, RAND Corporation
- Kanaka Shetty, RAND Corporation
- Elina Treyger, RAND Corporation
References
Journal Articles
-
Causal inference using mixture models: A word of caution
M. W. Robbins and C. M. Setodji
Medical Care, Mar 2014
-
Provider type and management of common visits in primary care
D. W. Roblin, H. Liu, L. F. Cromwell, M. Robbins, B. E. Robinson, and 2 more authors
American Journal of Managed Care, Mar 2017
Objectives:
Debate continues on whether nurse practitioners (NPs) and physician assistants (PAs) are more likely to order ancillary services, or order more costly services among alternatives, than primary care physicians (PCPs). We compared prescription medication and diagnostic service orders associated with NP/PA versus PCP visits for management of neck or back (N/B) pain or acute respiratory infection (ARI).
Study Design:
Retrospective, observational study of visits from January 2006 through March 2008 in the adult primary care practice of Kaiser Permanente in Atlanta, Georgia.
Methods:
Data were obtained from electronic health records. NP/PA and PCP visits for N/B pain or ARI were propensity score matched on patient age, gender, and comorbidities.
Results:
On propensity score-matched N/B pain visits (n = 6724), NP/PAs were less likely than PCPs to order a computed tomography (CT)/magnetic resonance image (MRI) scan (2.1% vs 3.3%, respectively) or narcotic analgesic (26.9% vs 28.5%) and more likely to order a nonnarcotic analgesic (13.5% vs 8.5%) or muscle relaxant (45.8% vs 42.5%) (all P ≤.05). On propensity score-matched ARI visits (n = 24,190), NP/PAs were more likely than PCPs to order any antibiotic medication (73.7% vs 65.8%), but less likely to order an x-ray (6.3% vs 8.6%), broad-spectrum antibiotic (41.5% vs 42.5%), or rapid strep test (6.3% vs 9.7%) (all P ≤.05).
Conclusions:
In the multidisciplinary primary care practice of this health maintenance organization, NP/PAs attending visits for N/B pain or ARI were less likely than PCPs to order advanced diagnostic radiology imaging services, to prescribe narcotic analgesics, and/or to prescribe broad-spectrum antibiotics.
-
Assessing the value of high quality care for work-associated carpal tunnel syndrome in a large integrated healthcare system: Study design
C. Conlon, S. Asch, M. Hanson, A. Avins, B. Levitan, and 6 more authors
The Permanente Journal, Mar 2016
Context
Little is known about quality of care for occupational health disorders, although it may affect worker health and workers’ compensation costs. Carpal tunnel syndrome (CTS) is a common work-associated condition that causes substantial disability.
Objective
To describe the design of a study that is assessing quality of care for work-associated CTS and associations with clinical outcomes and costs.
Design
Prospective observational study of 477 individuals with new workers’ compensation claims for CTS without acute trauma who were treated at 30 occupational health clinics from 2011 to 2013 and followed for 18 months.
Main Outcome Measures
Timing of key clinical events, adherence to 45 quality measures, changes in scores on the Boston Carpal Tunnel Questionnaire and 12-item Short Form Health Survey Version 2 (SF-12v2), and costs associated with medical care and disability.
Results
Two hundred sixty-seven subjects (56%) received a diagnosis of CTS and had claims filed around the first visit to occupational health, 104 (22%) received a diagnosis before that visit and claim, and 98 (21%) received a diagnosis or had claims filed after that visit. One hundred seventy-eight (37%) subjects had time off work, which started around the time of surgery in 147 (83%) cases and lasted a median of 41 days (interquartile range = 42 days).
Conclusions
The timing of diagnosis varied, but time off work was generally short and related to surgery. If associations of quality of care with key medical, economic, and quality-of-life outcomes are identified for work-associated CTS, systematic efforts to evaluate and improve quality of medical care for this condition are warranted.
-
The impact of using mid-level providers in face-to-face primary care on health care utilization
H. Liu, M. Robbins, A. Mehrotra, D. Auerbach, B. E. Robinson, and 2 more authors
Medical Care, Mar 2017
Background:
There has been concern that greater use of nurse practitioners (NP) and physician assistants (PA) in face-to-face primary care may increase utilization and spending.
Objective:
To evaluate a natural experiment within Kaiser Permanente in Georgia in the use of NP/PA in primary care.
Study Design:
From 2006 through early 2008 (the preperiod), each NP or PA was paired with a physician to manage a patient panel. In early 2008, NPs and PAs were removed from all face-to-face primary care. Using the 2006–2010 data, we applied a difference-in-differences analytic approach at the clinic level due to patient triage between a NP/PA and a physician. Clinics were classified into 3 different groups based on the percentage of visits by NP/PA during the preperiod: high (over 20% in-person primary care visits attended by NP/PAs), medium (5%–20%), and low (<5%) NP/PA model clinics.
Measures:
Referrals to specialist physicians; emergency department visits and inpatient admissions; and advanced diagnostic imaging services.
Results:
Compared with the low NP/PA model, the high NP/PA model and the medium NP/PA model were associated with 4.9% and 5.1% fewer specialist referrals, respectively (P<0.05 for both estimates); the high NP/PA model and the medium NP/PA model also showed fewer hospitalizations and emergency department visits and fewer advanced diagnostic imaging services, but none of these was statistically significant.
Conclusions:
We find no evidence to support concerns that under a physician’s supervision, NPs and PAs increase utilization and spending.
-
Quality of care for work-associated carpal tunnel syndrome
T. Nuckols, C. Conlon, M. Robbins, M. Dworsky, J. Lai, and 5 more authors
Journal of Occupational and Environmental Medicine, Mar 2017
Objective:
To evaluate the quality of care provided to individuals with workers’ compensation claims related to Carpal tunnel syndrome (CTS) and identify patient characteristics associated with receiving better care.
Methods:
We recruited subjects with new claims for CTS from 30 occupational clinics affiliated with Kaiser Permanente Northern California. We applied 45 process-oriented quality measures to 477 subjects’ medical records, and performed multivariate logistic regression to identify patient characteristics associated with quality.
Results:
Overall, 81.6% of care adhered to recommended standards. Certain tasks related to assessing and managing activity were underused. Patients with classic/probable Katz diagrams, positive electrodiagnostic tests, and higher incomes received better care. However, age, sex, and race/ethnicity were not associated with quality.
Conclusions:
Care processes for work-associated CTS frequently adhered to quality measures. Clinical factors were more strongly associated with quality than demographic and socioeconomic ones.
-
Quality of care and patient-reported outcomes among adults with work-associated carpal tunnel syndrome: A prospective observational study
T. Nuckols, C. Conlon, M. Robbins, M. Dworsky, J. Lai, and 5 more authors
Muscle and Nerve, Mar 2018
Introduction:
Higher quality care for carpal tunnel syndrome (CTS) may be associated with better outcomes.
Methods:
This prospective observational study recruited adults diagnosed with CTS from 30 occupational health centers, evaluated physicians’ adherence to recommended care processes, and assessed results of the Boston Carpal Tunnel Questionnaire (BCTQ) and Short Form Health Survey version 2 (SF-12v2) at recruitment and at 18 months.
Results:
Among 343 individuals, receiving better care (80th vs. 20th percentile for adherence) was associated with greater improvements in BCTQ Symptom Severity scores (−0.18, 95% confidence interval [CI] −0.32 to −0.05), BCTQ Functional Status scores (−0.21, 95% CI −0.34 to −0.08), and SF12-v2 Physical Component scores (1.75, 95% CI 0.33–3.16). Symptoms improved more when physicians assessed and managed activity, patients underwent necessary surgery, and employers adjusted job tasks.
Discussion:
Efforts should be made to ensure that patients with CTS receive essential care processes including necessary surgery and activity assessment and management.
-
The quality of electrodiagnostic tests for carpal tunnel syndrome: Implications for surgery, outcomes, and expenditures
K. D. Shetty, M. Robbins, D. Aragaki, A. Basu, C. Conlon, and 4 more authors
Muscle & Nerve, Mar 2020
Introduction
The quality of electrodiagnostic tests may influence treatment decisions, particularly regarding surgery, affecting health outcomes and health-care expenditures.
Methods
We evaluated test quality among 338 adults with workers’ compensation claims for carpal tunnel syndrome. Using simulations, we examined how it influences the appropriateness of surgery. Using regression, we evaluated associations with symptoms and functional limitations (Boston Carpal Tunnel Questionnaire), overall health (12-item Short Form Health Survey version 2), actual receipt of surgery, and expenditures.
Results
In simulations, suboptimal quality tests rendered surgery inappropriate for 99 of 309 patients (+32 percentage points). In regression analyses, patients with the highest quality tests had larger declines in symptoms (−0.50 point; 95% confidence interval [CI], −0.89 to −0.12) and functional impairment (−0.42 point; 95% CI, −0.78 to −0.06) than patients with the lowest quality tests. Test quality was not associated with overall health, actual receipt of surgery, or expenditures.
Discussion
Test quality is pivotal to determining surgical appropriateness and associated with meaningful differences in symptoms and function.
-
Nursing home responses to performance-based accountability: Results of a national survey
K. D. Shetty, A. Tolpadi, M. W. Robbins, E. A. Taylor, K. Campbell, and 1 more author
Journal of the American Geriatrics Society, Mar 2020
OBJECTIVES
The Centers for Medicare & Medicaid Services (CMS) Nursing Home Quality Initiative aims to improve quality through performance measurement. We describe quality improvement (QI) changes that skilled nursing facilities (SNFs) reported making in response to CMS performance measurements and whether reported QI changes were associated with better performance on CMS performance measures.
DESIGN
Nationally representative survey.
SETTING
A total of 15,475 SNFs that reported quality performance on Nursing Home Compare in 2016.
PARTICIPANTS
A total of 1,182 SNFs (58% of random sample of 2,045 SNFs).
MEASUREMENTS
Adoption of 22 possible QI changes, grouped into seven categories (organizational culture, health information technology, care process redesign, provider incentives, changes to staffing responsibilities, performance monitoring, and measure-specific QI initiatives and technical assistance); performance on the CMS Nursing Home Compare Five-Star Quality Rating System’s quality measure rating.
RESULTS
SNFs reported making an average of 13 QI changes (interquartile range = 11-16 changes). SNFs mostly commonly reported becoming a learning organization (87%) and providing training to staff on QI strategies (87%). After controlling for patient and facility characteristics, larger SNFs were more likely to obtain assistance on measure reporting from QI organizations and use provider champions than smaller SNFs by 14 and 11 percentage points, respectively. Rural SNFs and SNFs with higher proportions of disabled, black, or Hispanic residents adopted QI changes at similar rates as other SNFs. Of the 22 QI changes, 20 were considered at least somewhat helpful by more than 80% of adopting SNFs. Implementation of all 22 QI changes (vs no changes) was associated with a .48-star higher quality measure rating (95% confidence interval = .003-.98 stars; P = .05).
CONCLUSION
In response to CMS measurement programs, SNFs reported making substantial QI investments that were associated with better performance on CMS quality measures. To guide future SNF investments in QI, work is needed to identify the QI changes that yield the greatest performance improvements.
-
Quality of electrodiagnostic testing for carpal tunnel syndrome: Adherence to quality measures
D. Aragaki, A. Basu, C. Conlon, K. S. Shetty, M. Robbins, and 2 more authors
Muscle & Nerve, Mar 2020
Introduction
Research has shown that quality of health-care services is often suboptimal. Little is known about the quality of electrodiagnostic testing.
Methods
We prospectively recruited 477 adults with workers’ compensation claims for carpal tunnel syndrome (CTS) from 30 occupational health clinics and evaluated whether electrodiagnostic testing adhered to five process-oriented quality measures.
Results
Among patients who had surgery for CTS, nearly all underwent recommended preoperative electrodiagnostic testing (measure #1, 170 of 174, 97.7%). Most electrodiagnostic tests included essential components (measure #2, 295 of 379, 77.8%). However, few reports documented skin temperature (measure #3, 93 of 379, 24.5%) and criteria were seldom met for interpreting test findings as consistent with CTS (measure #4, 41 of 284, 14.4%) or “severe” CTS (measure #5, 8 of 46, 17.4%).
Discussion
Most patients underwent testing before surgery, but test quality was often suboptimal. This work lays the groundwork for future efforts to monitor and improve the quality of electrodiagnostic testing for CTS.
-
Home health agency adoption of quality improvement interventions and association with performance
K. D. Shetty, M. W. Robbins, D. Saliba, K. N. Campbell, and C. L. Damberg
Journal of the American Geriatrics Society, Mar 2021
Background
The Centers for Medicare & Medicaid Services (CMS) Home Health Quality Reporting Program (HHQRP) uses performance measurement to spur improvements in home health agencies’ (HHAs’) quality of care. We examined quality improvement (QI) activities HHAs reported making to improve on HHQRP quality measures, and whether reported QI activities were associated with better measure performance.
Methods
We used responses (N = 1052) from a Web- and mail-based survey of a stratified random sample of HHAs included in CMS Home Health Compare in October 2019. We estimated national adoption rates for 27 possible QI activities related to organizational culture, health information technology, care process redesign, provider incentives, provider training, changes to staffing responsibilities, performance monitoring, and measure-specific QI initiatives and technical assistance. We used multivariate linear regression to examine the associations between HHA characteristics and QI adoption, and between QI adoption and CMS Home Health Quality of Patient Care Star Rating.
Results
HHAs reported implementing an average of 16 QI activities (interquartile range 11–19 activities). Larger HHA size was associated with adopting 1.6 additional QI activities (p < 0.001). HHAs with higher proportions of disabled, black, or Hispanic patients adopted QI activities at similar or higher rates as other HHAs. Of the 27 QI activities, 23 were considered helpful by more than 80% of adopting HHAs. Compared with adopting 44% of QI activities (10th percentile among HHAs), adopting 89% of QI activities (90th percentile) was associated with a 0.4-star higher Star Rating (95% confidence interval 0.2–0.6).
Conclusions
HHAs report implementing a significant number of QI activities in response to CMS measurement programs; implementation of a greater number of activities is associated with better performance on publicly reported measures. To guide future HHA QI investments, work is needed to identify the optimal combination of QI activities and the specific QI activities that yield the greatest performance improvements.
-
Actions to improve quality: Results from a national hospital survey
K. D. Shetty, M. W. Robbins, A. Tolpadi, K. N. Campbell, A. M. Clancy, and 4 more authors
The American Journal of Managed Care, Mar 2021
Objectives:
CMS measures and reports hospital performance to drive quality improvement (QI), but information on actions that hospitals have taken in response to quality measurement is lacking. We aimed to develop national estimates of QI actions undertaken by hospitals and to explore their relationship to performance on CMS quality measures.
Study Design:
Nationally representative cross-sectional survey of acute care hospitals in 2016 (n = 1313 respondents; 64% response rate).
Methods:
We assessed 23 possible QI changes. Using multivariate linear regression, we estimated the relationship between reported QI changes and performance on composite measures derived from 26 Hospital Inpatient Quality Reporting Program measures (scaled 0-100), controlling for case mix and facility characteristics.
Results:
Hospitals reported implementing a mean of 17 QI changes (median [interquartile range], 17 [15-20]). Large hospitals reported significantly higher adoption rates than small hospitals for 18 QI changes. Most hospitals that reported making QI changes (63%-96% for the 23 changes) responded that the specific change made helped improve performance. In multivariate regression analyses, adoption of 92% of QI changes (90th percentile among hospitals), compared with adoption of 50% of QI changes (10th percentile), was associated with a 2.3-point higher overall performance score (95% CI, 0.7-4.0) and higher process (8.7 points; 95% CI, 5.7-11.7) and patient experience (3.0 points; 95% CI, 0.1-5.9) composite scores.
Conclusions: Hospitals reported widespread adoption of QI changes in response to CMS quality measurement and reporting. Higher QI adoption rates were associated with modestly higher process, patient experience, and overall performance composite scores.
-
The quality of occupational healthcare for carpal tunnel syndrome, healthcare expenditures, and disability outcomes: A prospective observational study
T. Nuckols, M. Dworsky, C. Conlon, M. Robbins, D. Benner, and 4 more authors
Muscle and Nerve, Mar 2023
Introduction/aims
In prior work, higher quality care for work-associated carpal tunnel syndrome (CTS) was associated with improved symptoms, functional status, and overall health. We sought to examine whether quality of care is associated with healthcare expenditures or disability.
Methods
Among 343 adults with workers’ compensation claims for CTS, we created patient-level aggregate quality scores for underuse (not receiving highly beneficial care) and overuse (receiving care for which risks exceed benefits). We assessed whether each aggregate quality score (0%–100%, 100% = better care) was associated with healthcare expenditures (18-mo expenditures, any anticipated need for future expenditures) or disability (days on temporary disability, permanent impairment rating at 18 mo).
Results
Mean aggregate quality scores were 77.8% (standard deviation [SD] 16.5%) for underuse and 89.2% (SD 11.0%) for overuse. An underuse score of 100% was associated with higher risk-adjusted 18-mo expenditures ($3672; 95% confidence interval [CI] $324 to $7021) but not with future expenditures (−0.07 percentage points; 95% CI −0.48 to 0.34), relative to a score of 0%. An overuse score of 100% was associated with lower 18-mo expenditures (−$4549, 95% CI −$8792 to −$306) and a modestly lower likelihood of future expenditures (−0.62 percentage points, 95% CI −1.23 to −0.02). Quality of care was not associated with disability.
Discussion
Improving quality of care could increase or lower short-term healthcare expenditures, depending on how often care is currently underused or overused. Future research is needed on quality of care in varied workers’ compensation contexts, as well as effective and economical strategies for improving quality.
-
Quasi-experimental evaluations of border-enforcement measures
M. W. Robbins, E. Treyger, and J. Chang
Journal of Homeland Security and Emergency Management, Mar 2025
In this article, we seek to establish a causal connection between border-enforcement actions or policies and metrics that might be used to measure relevant outcomes at the border. Applying quasi-experimental methods, we investigate the impact of surveillance technology on levels of U.S. Border Patrol apprehensions of unlawful border-crossers between ports of entry along the southwest border. Our analysis offers insights into some of the effects of surveillance technology and serves as a demonstration of concept for the usefulness of such statistical methods. The most robust finding is that deploying one type of technology – integrated fixed towers (IFTs) – is associated with decreased apprehension levels in the zones of deployment. Although we emphasize ambiguity in the meaning of the results and the uncertainty in statistical inference with relatively small numbers of deployments, we conclude that there is strong evidence that some migrants were deterred from crossing surveilled areas of the border. The results are more inconclusive for other surveillance assets, but there are suggestions that some may elevate apprehension levels, pointing to a boost to the U.S. Border Patrol’s situational awareness. Statistical methods both hold promise and have limitations for the study of the impact of border-enforcement measures beyond the analysis in this study. Although these methods cannot, on their own, yield clear answers in every case, they do have the potential to help policymakers understand and anticipate the impact and effectiveness of different border-enforcement measures.
-
Population displacement from Puerto Rico to U.S. states following Hurricane Maria
M. W. Robbins
Mathematical Population Studies, Mar 2025
Population displacement from Puerto Rico to the United States following Hurricane Maria is considered and distinguished from ongoing outmigration from the island. Administrative data from two sources is used to estimate displacement with a capture-recapture method which, unlike existing estimates, excludes outmigration that occurs independent of the hurricane. The estimates are compared with preceding trends in outmigration from the US Census Bureau. The results show that nearly 90,000 individuals or about 2.7% of Puerto Ricans (95% CI: 2.5%, 2.9%) were displaced through May 2018. Displacement was highest (approximately 4.7%) from southeastern Puerto Rico where the hurricane made landfall and, counter to earlier outmigration patterns indicated by data from the American Community Survey, was comparatively low from the region near San Juan (4.0% pre-hurricane outmigration against 2.4% post-hurricane displacement). Displacement rates were higher in areas with higher levels of Federal Emergency Management Agency-certified damage to housing (3.3% in areas with high damage against 2.2% in areas with low damage) but did not consistently differ by indicators of socioeconomic disadvantage, including poverty and unemployment rates. Households’ intent to return indicates that those who will not return are more likely to be younger or be unemployed (86% of individuals under the age of 30 will not return, whereas 58% of individuals aged 65 and up will not return, and 84% of employed households will return, whereas 68% of unemployed ones will); these characteristics align with those commonly observed among outmigrants in the decade prior to the hurricane.
-
Associations between a zero tolerance BAC law and traffic crashes and fatalities: Insights from a novel synthetic control method
S. Davenport, M. Robbins, M. Cerda, A. Riveral, and B. Kilmer
Addiction, Mar 2021
Background and aims
Debates regarding lowering the blood alcohol concentration (BAC) limit for drivers are intensifying in the United States and other countries, and the World Health Organization recommends that the limit for adults should be 0.05%. In January 2016, Uruguay implemented a law setting a zero BAC limit for all drivers. This study aimed to assess the effect of this policy on the frequency of moderate/severe injury and fatal traffic crashes.
Design
A quasi-experimental study in which a synthetic control model was used with controls consisting of local areas in Chile as the counterfactual for outcomes in Uruguay, matched across population counts and pre-intervention period outcomes. Sensitivity analyses were also conducted.
Setting
Uruguay and Chile.
Cases
Panel data with crash counts by outcome per locality-month (2013–2017).
Intervention and comparator
A zero blood alcohol concentration law implemented on 9 January 2016 in Uruguay, alongside a continued 0.03 g/dl BAC threshold in Chile.
Measurements
Per-capita moderate/severe injury (i.e. moderate or severe), severe injury and fatal crashes (2013–2017).
Findings
Our base synthetic control model results suggested a reduction in fatal crashes at 12 months [20.9%; P-value = 0.018, 95% confidence interval (CI) = −0.340, −0.061]. Moderate/severe injury crashes did not decrease significantly (10.2%, P = 0.312, 95% CI = −0.282, 0.075). The estimated effect at 24 months was smaller and with larger confidence intervals for fatal crashes (14%; P = 0.048, 95% CI = −0.246, −0.026) and largely unchanged for moderate/severe injury crashes (−9.4%, P = 0.302, 95% CI = −0.248, 0.058). Difference-in-differences analyses yielded similar results. As a sensitivity test, a synthetic control model relying on an inferior treatment–control match pre-intervention (measured by mean squared error) yielded similar-sized differences that were not statistically significant.
Conclusions
Implementation of a law setting a zero blood alcohol concentration threshold for all drivers in Uruguay appears to have resulted in a reduction in fatal crashes during the following 12 and 24 months.
-
Stay Tuned: Improving Sentiment Analysis and Stance Detection Using Large Language Models
M. G. Griswold, M. W. Robbins, and M. Pollard
Political Analysis, Mar 2025
Sentiment analysis and stance detection are key tasks in text analysis, with applications ranging from understanding political opinions to tracking policy positions. Recent advances in large language models (LLMs) offer significant potential to enhance sentiment analysis techniques and to evolve them into the more nuanced task of detecting stances expressed toward specific subjects. In this study, we evaluate lexicon-based models, supervised models, and LLMs for stance detection using two corpuses of social media data—a large corpus of tweets posted by members of the U.S. Congress on Twitter and a smaller sample of tweets from general users—which both focus on opinions concerning presidential candidates during the 2020 election. We consider several fine-tuning strategies to improve performance—including cross-target tuning using an assumption of congressmembers’ stance based on party affiliation—and strategies for fine-tuning LLMs, including few shot and chain-of-thought prompting. Our findings demonstrate that: 1) LLMs can distinguish stance on a specific target even when multiple subjects are mentioned, 2) tuning leads to notable improvements over pretrained models, 3) cross-target tuning can provide a viable alternative to in-target tuning in some settings, and 4) complex prompting strategies lead to improvements over pretrained models but underperform tuning approaches.
-
microsynth: Synthetic control methods with micro- and meso-level data in R
M. W. Robbins and S. Davenport
Journal of Statistical Software, Mar 2021
The R package microsynth has been developed for implementation of the synthetic control methodology for comparative case studies involving micro- or meso-level data. The methodology implemented within microsynth is designed to assess the efficacy of a treatment or intervention within a well-defined geographic region that is itself a composite of several smaller regions (where data are available at the more granular level for comparison regions as well). The effect of the intervention on one or more time-varying outcomes is evaluated by determining a synthetic control region that resembles the treatment region across pre-intervention values of the outcome(s) and time-invariant covariates and that is a weighted composite of many untreated comparison regions. The microsynth procedure includes functionality that enables its user to (1) calculate weights for synthetic control, (2) tabulate results for statistical inferences, and (3) create time series plots of outcomes for treatment and synthetic control. In this article, microsynth is described in detail and its application is illustrated using data from a drug market intervention in Seattle, WA.
Technical Reports
-
Benchmarking the Transportation Security Administration’s Covert Index Testing Program
J. C. Chang, M. W. Robbins, V. Barrer, and N. Maslov
Apr 2025
The Transportation Security Administration (TSA) conducts various covert testing programs as one of many ways to ensure the effectiveness of TSA’s workforce, operations, and programs. One such program is TSA Inspection’s Red Team Index Division’s (RTID’s) covert testing program, or the Index testing program, which is carefully directed and conducted from TSA headquarters. Index testing involves teams of detailed supervisory transportation security officers, detailed security training instructors, and inspection transportation security specialists who attempt to covertly bring standardized threat items in carry-on baggage, on their person, or in checked baggage through checkpoints into secure areas at commercial airports. The program is intended to produce quality testing data suitable for a national-level analysis.
Considering the importance of Index testing to the security of commercial aviation, the U.S. Department of Homeland Security (DHS) Science and Technology Directorate (S&T), in consultation with RTID, asked a research team from the Homeland Security Operational Analysis Center (HSOAC) to benchmark Index testing against other testing analogues (covert or overt, and for different purposes) to see whether the program conforms to some industry quality management standards or best practices.
The findings and recommendations presented in this report are based on a literature review, interviews, and the HSOAC team’s experience with Index testing. From the literature review and interviews, we were unable to find a direct analogue to Index testing. The existing programs that we identified lacked the resources required to run many replications of a test scenario and perform rigorous statistical analyses. However, we were able to find some standards in the literature that are applicable to Index testing. Using subjective determination, we observed that most of these standards are currently implemented in Index testing but that the implementation status for a handful of standards is unclear (i.e., either they were not implemented or we were unable to assert with confidence that the standard is implemented). We made recommendations on whether RTID should consider new steps toward implementing these standards. We then isolated aspects of Index testing that appear to be untouched by existing standards (because of the unique nature of the program) and proposed some relevant new standards that RTID might consider to fill those gaps.
Manuals
-
gerbil: Generalized Efficient Regression-Based Imputation with Latent Processes
M. Robbins, P. Lima, and M. Griswold
Apr 2023
R package version 0.1.9