Efficient spatiotemporal interpolation with spark machine learning

Weitian Tong, Lixin Li, Xiaolu Zhou, Jason Franklin

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Traditional spatiotemporal interpolation methods either consider the spatial and temporal dimensions separately or incorporate both dimensions simultaneously by simply treating time as another dimension in space. Such interpolation results suffer from relatively low accuracy as the true space-time domain is skewed inappropriately and the distance calculation in such domain is not accurate. We employ the efficient k-d tree structure to store spatiotemporal data and adopt several machine learning methods to learn optimal parameters. To overcome the computational difficulty with large data sets, we implement our method on an efficient cluster computing framework – Apache Spark. Real world PM 2.5 data sets are utilized to test our implementation and the experimental results demonstrate the computational power of our method, which significantly outperforms the previous work in terms of both speed and accuracy.

Original languageEnglish
Pages (from-to)87-96
Number of pages10
JournalEarth Science Informatics
Volume12
Issue number1
DOIs
StatePublished - Mar 6 2019

Keywords

  • Bootstrap aggregating
  • Inverse distance weighting (IDW)
  • k-d tree
  • Machine learning
  • Spark
  • Spatiotemporal interpolation

Fingerprint

Dive into the research topics of 'Efficient spatiotemporal interpolation with spark machine learning'. Together they form a unique fingerprint.

Cite this