Hot Deck Imputation for Mixed Typed Datasets using Model Based Clustering

Sarbesh Raj Pandeya, Haresh Rochani

Research output: Contribution to conferencePresentation

Abstract

Multiple imputation is a commonly used method when addressing the issue of missing values. Hot deck imputation is distinctively different than others to ensure closeness to true variance in estimating the regression coefficients as it involves the replacement of unobserved values by observed values in similar units or cells. These cells are determined in terms of the closeness of each observation using various distance measures. But most of the distance measures can only be applied to continuous variables. Thus, there is a distinct problem when there are categorical covariates in the dataset. We proposed for a model based clustering procedure that uses a parsimonious covariance structure of the latent variable, following a mixture of Gaussian distributions to generate the imputation cells of mixed type dataset (i.e. datasets with continuous and categorical variables). The results of the simulated data showed demonstrated lower variance compared to the complete cases in estimation of regression coefficients.

Original languageAmerican English
StatePublished - Mar 13 2017
EventEastern North American Region International Biometric Society Spring Meeting (ENAR) -
Duration: Mar 25 2018 → …

Conference

ConferenceEastern North American Region International Biometric Society Spring Meeting (ENAR)
Period03/25/18 → …

Keywords

  • Hot Deck
  • Model Based Clustering
  • Typed Datasets

DC Disciplines

  • Biostatistics
  • Public Health

Fingerprint

Dive into the research topics of 'Hot Deck Imputation for Mixed Typed Datasets using Model Based Clustering'. Together they form a unique fingerprint.

Cite this