Missing Data

My postdoctoral work involved the creation of an algorithm for imputation of missing data in a large agricultural survey (i.e., the USDA’s Agricultural Resource Management Survey). This work presented unique challenges due to the size and distributional structure of the dataset, and yielded several publications (Robbins & White, 2011; Robbins et al., 2013; Robbins & White, 2014; Robbins, 2014). The resulting algorithm contained several novel characteristics to facilitate theoretically valid and computationally efficient imputation with complex data, including copula modeling via transformation using empirical distributions, creative use of the SWEEP operator to improve efficiency, and construction of a joint model via a sequence of selected conditional models.

Motivated by specific issues encountered when performing imputation in a large Department of Defense survey while at RAND, I later generalized the above procedure to produce the GERBIL algorithm (Robbins, 2024), which is available in the R package gerbil (Robbins et al., 2023). By using a latent multivariate Gaussian model with probit-type assumptions for non-continuous variables, this method can create imputations in data of a general form (with continuous, binary, unordered categorical and ordinal variables) while using joint modeling in a highly computationally efficient manner and enables flexibility when constructing the imputation model. It is shown to outperform other state-of-the-art procedures in terms of both quality of imputations and computational burden.

Variance estimation in the presence of imputed data typically relies on algebraic expressions and the validity of multiple imputation combining rules. To improve the utility of imputed data in a more broad array of settings, I recently developed the theory that underpins the use of resampling procedures such as a bootstrap or jackknife with imputed data (Robbins & Burgette, 2025). This work illustrates the vast computation burden required for resampling procedures with imputed data, which emphasizes the value in efficient algorithms such as gerbil.

Collaborators:


References

Journal Articles

  1. Farm commodity payments and imputation in the Agricultural Resource Management Survey
    M. W. Robbins and T. K. White
    American Journal of Agricultural Economics, 2011
  2. Imputation in high dimensional economic data as applied to the Agricultural Resource Management Survey
    M. W. Robbins, S. K. Ghosh, and J. D. Habiger
    Journal of the American Statistical Association, 2013
  3. Direct payments, cash rents, land values, and the effects of imputation in U.S. farm-level data
    M. W. Robbins and T. K. White
    Agricultural and Resource Economics Review, 2014
  4. The utility of nonparametric transformations for imputation of survey data
    M. W. Robbins
    Journal of Official Statistics, 2014
  5. Joint imputation of general data
    M. Robbins
    Journal of Survey Statistics and Methodology, 2024
  6. Resampling methods with multiply imputed data
    M. W. Robbins and L. Burgette
    Biometrika, 2025

Manuals

  1. gerbil: Generalized Efficient Regression-Based Imputation with Latent Processes
    M. Robbins, P. Lima, and M. Griswold
    2023
    R package version 0.1.9