Dynamic data rebalancing in Hadoop

Ashwin T.K. Kumar, Jongyeop Kim, K. M. George, Nohpill Park

Research output: Contribution to book or proceedingConference articlepeer-review

4 Scopus citations

Abstract

Current implementation of Hadoop is based on an assumption that all the nodes in a Hadoop cluster are homogenous. Data in a Hadoop cluster is split into blocks and are replicated based on the replication factor. Service time for jobs that accesses data stored in Hadoop considerably increases when the number of jobs is greater than the number of copies of data and when the nodes in Hadoop cluster differ much in their processing capabilities. This paper addresses dynamic data rebalancing in a heterogeneous Hadoop cluster. Data rebalancing is done by replicating data dynamically with minimum data movement cost based on the number of incoming parallel mapreduce jobs. Our experiments indicate that as a result of dynamic data rebalancing service time of mapreduce jobs were reduced by over 30% and resource utilization is increased by over 50% when compared against Hadoop.

Original languageEnglish
Title of host publication2014 IEEE/ACIS 13th International Conference on Computer and Information Science, ICIS 2014 - Proceedings
EditorsYan Han, Wenai Song, Simon Xu, Lichao Chen, Roger Lee
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages315-320
Number of pages6
ISBN (Electronic)9781479948604
DOIs
StatePublished - Sep 26 2014
Event2014 13th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2014 - Proceedings - Taiyuan, China
Duration: Jun 4 2014Jun 6 2014

Publication series

Name2014 IEEE/ACIS 13th International Conference on Computer and Information Science, ICIS 2014 - Proceedings

Conference

Conference2014 13th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2014 - Proceedings
Country/TerritoryChina
CityTaiyuan
Period06/4/1406/6/14

Scopus Subject Areas

  • Information Systems
  • Computer Science Applications

Keywords

  • Dynamic Data Rebalancing
  • Hadoop
  • Replication
  • heterogeneity
  • service time
  • waiting time

Fingerprint

Dive into the research topics of 'Dynamic data rebalancing in Hadoop'. Together they form a unique fingerprint.

Cite this