TY - JOUR

T1 - Effects of Normalization Techniques on Logistic Regression in Data Science

AU - Adeyemo, Nureni Adekunle

AU - Wimmer, Hayden

AU - Powell, Loreen Marie

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The improvements in the data science profession have allowed the introduction of several mathematical ideas to social patterns of data. This research seeks to investigate how different normalization techniques can affect the performance of logistic regression. The original dataset was modeled using the SQL Server Analysis Services (SSAS) Logistic Regression model. This became the baseline model for the research. The normalization methods used to transform the original dataset were described. Next, different logistic models were built based on the three normalization techniques discussed. This work found that, in terms of accuracy, decimal scaling marginally outperformed minmax and z-score scaling. But when Lift was used to evaluate the performances of the models built, decimal scaling and z-score slightly performed better than min-max method. Future work is recommended to test the regression model on other datasets specifically those whose dependent variable are a 2-category problem or those with varying magnitude independent attributes.

AB - The improvements in the data science profession have allowed the introduction of several mathematical ideas to social patterns of data. This research seeks to investigate how different normalization techniques can affect the performance of logistic regression. The original dataset was modeled using the SQL Server Analysis Services (SSAS) Logistic Regression model. This became the baseline model for the research. The normalization methods used to transform the original dataset were described. Next, different logistic models were built based on the three normalization techniques discussed. This work found that, in terms of accuracy, decimal scaling marginally outperformed minmax and z-score scaling. But when Lift was used to evaluate the performances of the models built, decimal scaling and z-score slightly performed better than min-max method. Future work is recommended to test the regression model on other datasets specifically those whose dependent variable are a 2-category problem or those with varying magnitude independent attributes.

KW - Decimal Scaling

KW - Logistic Regression

KW - Min-Max

KW - Normalization

KW - Z-Score

UR - https://digitalcommons.georgiasouthern.edu/information-tech-facpubs/111

UR - http://proc.conisar.org/2018/pdf/4813.pdf

M3 - Article

VL - 11

JO - 2018 Proceedings of the Conference on Information Systems Applied Research

JF - 2018 Proceedings of the Conference on Information Systems Applied Research

ER -