Enhancing Flight Delay Prediction with Network-Aware Ensemble Learning

Research output: Contribution to book or proceedingConference articlepeer-review

Abstract

This study presents a comprehensive framework for predicting departure delays in U.S. domestic aviation by integrating advanced feature engineering, network analysis, and ensemble learning methods. Using a dataset of 2,638,673 flights across 354 airports from May to August 2024, we engineered predictors using temporal features (cyclical time), operational metrics (airport congestion), and network characteristics (in-/out-degree centrality and cluster labels). We extracted data for the five airlines with the highest number of flights: Southwest (WN), American (AA), Delta (DL), United (UA), and SkyWest (OO). A novel greedy mutual information and correlation-based feature selection method was then applied to each dataset to improve prediction performance. Multiple classifiers, including Random Forest (RF), Extra Trees (ET), XGBoost, and LightGBM, were evaluated. RF and ET consistently outperformed the others, motivating their inclusion in a Voting ensemble. The Voting classifier achieved robust performance across all five airlines, with overall accuracy ranging from 88.9% to 91.8%, F1–scores between 88.5% and 91.4%, and AUC–ROC values all above 95%. DL yielded the highest performance (91.8% accuracy and 96.8% AUC–ROC). These results demonstrate that combining network–cluster information with rich historical features substantially improves delay prediction, providing a scalable approach for airlines and air traffic managers to mitigate operational disruptions.

Original languageEnglish
Title of host publicationDatabase Engineered Applications - 29th International Symposium, IDEAS 2025, Proceedings
EditorsGiacomo Bergami, Paul Ezhilchelvan, Yannis Manolopoulos, Sergio Ilarri, Jorge Bernardino, Carson K. Leung, Peter Z. Revesz
PublisherSpringer Science and Business Media Deutschland GmbH
Pages109-121
Number of pages13
ISBN (Print)9783032067432
DOIs
StatePublished - Oct 31 2025
Event29th International Database Engineered Applications Symposium, IDEAS 2025 - Newcastle upon Tyne, United Kingdom
Duration: Jul 14 2025Jul 16 2025

Publication series

NameLecture Notes in Computer Science
Volume15928 LNCS

Conference

Conference29th International Database Engineered Applications Symposium, IDEAS 2025
Country/TerritoryUnited Kingdom
CityNewcastle upon Tyne
Period07/14/2507/16/25

Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Ensemble learning
  • Feature selection
  • Flight delay prediction
  • Network clustering
  • Voting classifier

Fingerprint

Dive into the research topics of 'Enhancing Flight Delay Prediction with Network-Aware Ensemble Learning'. Together they form a unique fingerprint.

Cite this