TY - JOUR
T1 - Prediction of periventricular leukomalacia. Part I
T2 - Selection of hemodynamic features using logistic regression and decision tree algorithms
AU - Samanta, Biswanath
AU - Bird, Geoffrey L.
AU - Kuijpers, Marijn
AU - Zimmerman, Robert A.
AU - Jarvik, Gail P.
AU - Wernovsky, Gil
AU - Clancy, Robert R.
AU - Licht, Daniel J.
AU - Gaynor, J. William
AU - Nataraj, Chandrasekhar
PY - 2009/7
Y1 - 2009/7
N2 - Objective: Periventricular leukomalacia (PVL) is part of a spectrum of cerebral white matter injury which is associated with adverse neurodevelopmental outcome in preterm infants. While PVL is common in neonates with cardiac disease, both before and after surgery, it is less common in older infants with cardiac disease. Pre-, intra-, and postoperative risk factors for the occurrence of PVL are poorly understood. The main objective of the present work is to identify potential hemodynamic risk factors for PVL occurrence in neonates with complex heart disease using logistic regression analysis and decision tree algorithms. Methods: The postoperative hemodynamic and arterial blood gas data (monitoring variables) collected in the cardiac intensive care unit of Children's Hospital of Philadelphia were used for predicting the occurrence of PVL. Three categories of datasets for 103 infants and neonates were used-(1) original data without any preprocessing, (2) partial data keeping the admission, the maximum and the minimum values of the monitoring variables, and (3) extracted dataset of statistical features. The datasets were used as inputs for forward stepwise logistic regression to select the most significant variables as predictors. The selected features were then used as inputs to the decision tree induction algorithm for generating easily interpretable rules for prediction of PVL. Results: Three sets of data were analyzed in SPSS for identifying statistically significant predictors (p < 0.05) of PVL through stepwise logistic regression and their correlations. The classification success of the Case 3 dataset of extracted statistical features was best with sensitivity (SN), specificity (SP) and accuracy (AC) of 87, 88 and 87%, respectively. The identified features, when used with decision tree algorithms, gave SN, SP and AC of 90, 97 and 94% in training and 73, 58 and 65% in test. The identified variables in Case 3 dataset mainly included blood pressure, both systolic and diastolic, partial pressures pO2 and pCO2, and their statistical features like average, variance, skewness (a measure of asymmetry) and kurtosis (a measure of abrupt changes). Rules for prediction of PVL were generated automatically through the decision tree algorithms. Conclusions: The proposed approach combines the advantages of statistical approach (regression analysis) and data mining techniques (decision tree) for generation of easily interpretable rules for PVL prediction. The present work extends an earlier research [Galli KK, Zimmerman RA, Jarvik GP, Wernovsky G, Kuijpers M, Clancy RR, et al. Periventricular leukomalacia is common after cardiac surgery. J Thorac Cardiovasc Surg 2004;127:692-704] in the form of expanding the feature set, identifying additional prognostic factors (namely pCO2) emphasizing the temporal variations in addition to upper or lower values, and generating decision rules. The Case 3 dataset was further investigated in Part II for feature selection through computational intelligence.
AB - Objective: Periventricular leukomalacia (PVL) is part of a spectrum of cerebral white matter injury which is associated with adverse neurodevelopmental outcome in preterm infants. While PVL is common in neonates with cardiac disease, both before and after surgery, it is less common in older infants with cardiac disease. Pre-, intra-, and postoperative risk factors for the occurrence of PVL are poorly understood. The main objective of the present work is to identify potential hemodynamic risk factors for PVL occurrence in neonates with complex heart disease using logistic regression analysis and decision tree algorithms. Methods: The postoperative hemodynamic and arterial blood gas data (monitoring variables) collected in the cardiac intensive care unit of Children's Hospital of Philadelphia were used for predicting the occurrence of PVL. Three categories of datasets for 103 infants and neonates were used-(1) original data without any preprocessing, (2) partial data keeping the admission, the maximum and the minimum values of the monitoring variables, and (3) extracted dataset of statistical features. The datasets were used as inputs for forward stepwise logistic regression to select the most significant variables as predictors. The selected features were then used as inputs to the decision tree induction algorithm for generating easily interpretable rules for prediction of PVL. Results: Three sets of data were analyzed in SPSS for identifying statistically significant predictors (p < 0.05) of PVL through stepwise logistic regression and their correlations. The classification success of the Case 3 dataset of extracted statistical features was best with sensitivity (SN), specificity (SP) and accuracy (AC) of 87, 88 and 87%, respectively. The identified features, when used with decision tree algorithms, gave SN, SP and AC of 90, 97 and 94% in training and 73, 58 and 65% in test. The identified variables in Case 3 dataset mainly included blood pressure, both systolic and diastolic, partial pressures pO2 and pCO2, and their statistical features like average, variance, skewness (a measure of asymmetry) and kurtosis (a measure of abrupt changes). Rules for prediction of PVL were generated automatically through the decision tree algorithms. Conclusions: The proposed approach combines the advantages of statistical approach (regression analysis) and data mining techniques (decision tree) for generation of easily interpretable rules for PVL prediction. The present work extends an earlier research [Galli KK, Zimmerman RA, Jarvik GP, Wernovsky G, Kuijpers M, Clancy RR, et al. Periventricular leukomalacia is common after cardiac surgery. J Thorac Cardiovasc Surg 2004;127:692-704] in the form of expanding the feature set, identifying additional prognostic factors (namely pCO2) emphasizing the temporal variations in addition to upper or lower values, and generating decision rules. The Case 3 dataset was further investigated in Part II for feature selection through computational intelligence.
KW - Congenital heart disease
KW - Data mining
KW - Decision tree algorithms
KW - Logistic regression
KW - Periventricular leukomalacia
KW - Prognostics
UR - http://www.scopus.com/inward/record.url?scp=67349199555&partnerID=8YFLogxK
U2 - 10.1016/j.artmed.2008.12.005
DO - 10.1016/j.artmed.2008.12.005
M3 - Article
C2 - 19162455
AN - SCOPUS:67349199555
SN - 0933-3657
VL - 46
SP - 201
EP - 215
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
IS - 3
ER -