Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science

Hayden Wimmer, Loreen Powell

Research output: Contribution to book or proceedingChapter

Abstract

Medical datasets are large and complex. Due to the number of variables contained within medical data, machine learning algorithms may not be able to induct patterns from the data or may over fit the learned model to the data thereby reducing the generalizability of the model. Feature reduction seeks to limit the number of variables as input by establishing correlations between variables and reducing the overall feature set to the minimum number of possible variables to describe the data. This research seeks to examine the effects of principal component analysis for feature reduction when applied to decision trees. Results indicate that principle component analysis (PCA) may be employed to reduce the number of features; however, the results suffer minor degradation.

Original languageAmerican English
Title of host publicationProceedings of the Conference on Information Systems Applied Research
StatePublished - Jan 1 2016

Keywords

  • Component analysis
  • Data Preprocessing
  • Data Science
  • Feature Reduction

DC Disciplines

  • Computer Sciences

Fingerprint

Dive into the research topics of 'Principle Component Analysis for Feature Reduction and Data Preprocessing in Data Science'. Together they form a unique fingerprint.

Cite this