<tt>gerbil</tt> | Michael W. Robbins

Generalized Efficient Regression-Based Imputation with Latent Processes

The R package gerbil (Robbins et al., 2023) implements a new multiple imputation method that draws imputations from a latent joint multivariate normal model which underpins generally structured data. This model is constructed using a sequence of flexible conditional linear models that enables the resulting procedure to be efficiently implemented on high dimensional datasets in practice.

The methodology employed in gerbil was originally presented in an article published in the Journal of Survey Statistics and Methodology (Robbins, 2024).

Links:

Collaborators:

Max Griswold, King County DCHS
Pedro Nascimento de Lima, RAND Corporation

References

Manuals

gerbil: Generalized Efficient Regression-Based Imputation with Latent Processes

M. Robbins, P. Lima, and M. Griswold

Apr 2023

R package version 0.1.9

HTML

Journal Articles

Joint imputation of general data

M. Robbins

Journal of Survey Statistics and Methodology, Apr 2024

Abs HTML

High-dimensional complex survey data of general structures (e.g., containing continuous, binary, categorical, and ordinal variables), such as the US Department of Defense’s Health-Related Behaviors Survey (HRBS), often confound procedures designed to impute any missing survey data. Imputation by fully conditional specification (FCS) is often considered the state of the art for such datasets due to its generality and flexibility. However, FCS procedures contain a theoretical flaw that is exposed by HRBS data—HRBS imputations created with FCS are shown to diverge across iterations of Markov Chain Monte Carlo. Imputation by joint modeling lacks this flaw; however, current joint modeling procedures are neither general nor flexible enough to handle HRBS data. As such, we introduce an algorithm that efficiently and flexibly applies multiple imputation by joint modeling in data of general structures. This procedure draws imputations from a latent joint multivariate normal model that underpins the generally structured data and models the latent data via a sequence of conditional linear models, the predictors of which can be specified by the user. We perform rigorous evaluations of HRBS imputations created with the new algorithm and show that they are convergent and of high quality. Lastly, simulations verify that the proposed method performs well compared to existing algorithms including FCS.