Identification of the Optimal Hadoop Configuration Parameters Set for Mapreduce Computing

Jongyeop Kim, Nohpill Park

Research output: Contribution to book or proceedingChapter

8 Scopus citations

Abstract

This paper investigates on the techniques to search for optimal configuration parameters sets for Hadoop HDFS (Hadoop Distributed File System). An optimization technique, socalled the automated benchmarking configuration methodology (ABCM) [4], has been proposed and demonstrated by employing a two-staged sampling technique in order to mitigate the computational complexity and cost of the search process for the optimal configuration parameters set. In this paper, a few methods are further employed to sample those configuration parameters sets such as random Monte Carlo, correlation approaches (versus sequential approach in ABCM) in an effort to improve the level of the resulting performance from the identified optimal configuration parameters set and the execution time as well. Experiments are conducted to compare the level of the resulting performances, the Monte Carlo and Correlation coefficient-based algorithms are developed and implemented to identify a better set of Ω space [4] for a benchmark TestDFSIO in which the number of iterations are kept at the same for comparison purpose, and their resulting performances are compared against the sequential. It is observed that the optimal configuration parameters set identified by the Monte Carlo-based approach reduces the execution time of the benchmark run by 13.84% compared to the sequential sampling method, while the correlation-based method ended up with an unexpected result suspiciously due to lack of linearity of correlation which to be validated in the future work.
Original languageAmerican English
Title of host publicationApplied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI)
DOIs
StatePublished - Nov 30 2015

DC Disciplines

  • Engineering
  • Computer Engineering

Fingerprint

Dive into the research topics of 'Identification of the Optimal Hadoop Configuration Parameters Set for Mapreduce Computing'. Together they form a unique fingerprint.

Cite this