TY - GEN
T1 - Airline-Specific Flight Delay Prediction with Tree-Based Models and Network Metrics
AU - Afrane, Mary Dufie
AU - Xu, Yao
AU - Li, Lixin
AU - Wang, Kai
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/5/7
Y1 - 2025/5/7
N2 - Flight delays present a significant challenge in modern air traffic management and affect airlines, passengers, and the economy. This study proposes a comprehensive approach to predicting flight delays using tree-based machine learning models, integrating flight and weather data with advanced feature engineering techniques. New features, including historical delay metrics and network centrality measures, are derived to enhance predictive accuracy. The dataset is grouped by airlines to account for variations in flight delay patterns across different airlines. Tree-based ensemble models, including random forest, XGBoost, CatBoost, lightGBM, and extra trees, are employed. Results show that prediction metrics improve when models are trained on airline-specific data compared to using the entire dataset with airlines as a feature. For airline-specific analysis, the random forest model achieves the highest average accuracy (92.6%) and precision (97.0%), while the extra trees model achieves the highest average recall (88.5%) and AUC-ROC (97.5%), and both models achieve the highest F1-score (92.2%). These findings emphasize the importance of analyzing airline-specific dynamics and provide actionable insights for mitigating delays. This study advances flight delay prediction by integrating domain-specific features with robust machine learning models.
AB - Flight delays present a significant challenge in modern air traffic management and affect airlines, passengers, and the economy. This study proposes a comprehensive approach to predicting flight delays using tree-based machine learning models, integrating flight and weather data with advanced feature engineering techniques. New features, including historical delay metrics and network centrality measures, are derived to enhance predictive accuracy. The dataset is grouped by airlines to account for variations in flight delay patterns across different airlines. Tree-based ensemble models, including random forest, XGBoost, CatBoost, lightGBM, and extra trees, are employed. Results show that prediction metrics improve when models are trained on airline-specific data compared to using the entire dataset with airlines as a feature. For airline-specific analysis, the random forest model achieves the highest average accuracy (92.6%) and precision (97.0%), while the extra trees model achieves the highest average recall (88.5%) and AUC-ROC (97.5%), and both models achieve the highest F1-score (92.2%). These findings emphasize the importance of analyzing airline-specific dynamics and provide actionable insights for mitigating delays. This study advances flight delay prediction by integrating domain-specific features with robust machine learning models.
KW - extra trees
KW - flight delay prediction
KW - machine learning
KW - network centrality
KW - random forest
UR - https://www.scopus.com/pages/publications/105012206277
U2 - 10.1109/AIRC64931.2025.11077486
DO - 10.1109/AIRC64931.2025.11077486
M3 - Conference article
AN - SCOPUS:105012206277
SN - 9798331543488
T3 - 2025 6th International Conference on Artificial Intelligence, Robotics and Control (AIRC)
SP - 535
EP - 540
BT - 2025 6th International Conference on Artificial Intelligence, Robotics, and Control, AIRC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Artificial Intelligence, Robotics, and Control, AIRC 2025
Y2 - 7 May 2025 through 9 May 2025
ER -