Identification of core, semi-core and redundant attributes of a dataset

Ray R. Hashemi, Azita Bahrami, Mark Smith, Simon Young

Research output: Contribution to book or proceedingConference articlepeer-review

3 Scopus citations

Abstract

Data reduction is an essential step in pre-processing of a dataset and it is necessary for improving data quality and obtaining the relevant data from the dataset. Data reduction is performed by identifying and removing redundant attributes of the dataset. However, every non-redundant attribute does not have the same level of contribution to the decision (dependent variable). Therefore, the non-redundant attributes may be further divided into two sub-categories of core (attributes that totally contribute to the decision) and semi-core (attributes that partially contribute to the decision) attributes. In this paper, a methodology for separating core, semi-core, and redundant attributes is introduced and tested. The result shows that the proposed methodology has a high potential for use in any generalization process.

Original languageEnglish
Title of host publicationProceedings - 2011 8th International Conference on Information Technology
Subtitle of host publicationNew Generations, ITNG 2011
PublisherIEEE Computer Society
Pages580-584
Number of pages5
ISBN (Print)9780769543673
DOIs
StatePublished - 2011

Publication series

NameProceedings - 2011 8th International Conference on Information Technology: New Generations, ITNG 2011

Keywords

  • Cluster Quality
  • Core attribute
  • Data Reduction
  • Entropy
  • Information gain
  • Redundant Attribute
  • SOM
  • Semi-core attribute
  • VSOM clustering

Fingerprint

Dive into the research topics of 'Identification of core, semi-core and redundant attributes of a dataset'. Together they form a unique fingerprint.

Cite this