Abstract
To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Traditional spatiotemporal interpolation methods either consider the spatial and temporal dimensions separately or incorporate both dimensions simultaneously by simply treating time as another dimension in space. Such interpolation results suffer from relatively low accuracy as the true space-time domain is skewed inappropriately and the distance calculation in such domain is not accurate. We employ the efficient k-d tree structure to store spatiotemporal data and adopt several machine learning methods to learn optimal parameters. To overcome the computational difficulty with large data sets, we implement our method on an efficient cluster computing framework – Apache Spark. Real world PM 2.5 data sets are utilized to test our implementation and the experimental results demonstrate the computational power of our method, which significantly outperforms the previous work in terms of both speed and accuracy.
Original language | English |
---|---|
Pages (from-to) | 87-96 |
Number of pages | 10 |
Journal | Earth Science Informatics |
Volume | 12 |
Issue number | 1 |
DOIs | |
State | Published - Mar 6 2019 |
Keywords
- Bootstrap aggregating
- Inverse distance weighting (IDW)
- k-d tree
- Machine learning
- Spark
- Spatiotemporal interpolation