Variable Selection Methods for Big Data: A Comparative Study

Jun Liu, Xuejing Mao

Research output: Contribution to conferencePresentation

Abstract

Variable selection is an important step in statistical analysis. When the number of potential predictors is small, this step is straightforward. But with more and more predicators available in today's environment, this step becomes more and more critical and complicated. Logistic regression has many applications in business area. One of the areas logistic regression is widely used is risk management, for example, to predict the likelihood that a customer will be delinquent. In this paper, we will compare the performance of three commonly used variable selection methods in logistic regression using a large data set. This dataset is typical "Big" data as the number of records , as well as the number of variables in this dataset are very large.

Original languageAmerican English
StatePublished - Aug 11 2015
EventJoint Statistical Meetings -
Duration: Aug 11 2015 → …

Conference

ConferenceJoint Statistical Meetings
Period08/11/15 → …

Disciplines

  • Business Administration, Management, and Operations
  • Operations and Supply Chain Management

Keywords

  • Big data
  • Logistic regression
  • Variable selection methods

Fingerprint

Dive into the research topics of 'Variable Selection Methods for Big Data: A Comparative Study'. Together they form a unique fingerprint.

Cite this