Performance evaluation and tuning for MapReduce computing in Hadoop distributed file system

Jongyeop Kim, T. K.Ashwin Kumar, K. M. George, Nohpill Park

Research output: Contribution to book or proceedingConference articlepeer-review

11 Scopus citations

Abstract

This paper proposes a method to facilitate the identification process for a set of configuration parameters to achieve the optimal performance with respect to a benchmark program in HDFS in an automated manner. Performance optimization of Hadoop processes is a tedious yet challenging problem due to the complexity of the systems organization with an extensive list of configuration parameters to be considered. An Automated Benchmarking Configuration Method (ABCM) is developed in this work to facilitate the identification process for the set of configuration parameters that minimizes the execution time of a benchmark, namely TestDFSIO Write and Read in particular. A two-phased configuration parameters selection process with a simple sampling technique is proposed in order to mediate the exponential computation time otherwise. By using the proposed technique, we have automatically found the sets of top five selected optimal configuration parameters that reduced the average execution time by 32% compared to the execution time with the default set of Hadoop configuration parameters.

Original languageEnglish
Title of host publicationProceeding - 2015 IEEE International Conference on Industrial Informatics, INDIN 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages62-68
Number of pages7
ISBN (Electronic)9781479966493
DOIs
StatePublished - Sep 28 2015
Event13th International Conference on Industrial Informatics, INDIN 2015 - Cambridge, United Kingdom
Duration: Jul 22 2015Jul 24 2015

Publication series

NameProceeding - 2015 IEEE International Conference on Industrial Informatics, INDIN 2015

Conference

Conference13th International Conference on Industrial Informatics, INDIN 2015
Country/TerritoryUnited Kingdom
CityCambridge
Period07/22/1507/24/15

Keywords

  • benchmarks
  • Hadoop
  • Hadoop configuration
  • HDFS
  • performance tuning

Fingerprint

Dive into the research topics of 'Performance evaluation and tuning for MapReduce computing in Hadoop distributed file system'. Together they form a unique fingerprint.

Cite this