TY - GEN
T1 - Email Classification of Text Data Using Machine Learning and Natural Language Processing Technique
AU - Ijogun, Oluwaseyi
AU - Wimmer, Hayden
AU - Rebman, Carl
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Spam and Phishing emails are the most crucial in social networks, many issues arise through emails such as cost of dealing with spam and phishing emails due to their large quantities, privacy resulting in loss of sensitive information, time taken to identify spam and phishing emails, and cyber security threat due to malicious content. Using a spam and phishing detection approach, a model can quickly recognize spam and phishing emails and classify them before they become a threat to the organization. In this study, a machine learning and Natural Language processing-based supervised learning approach was used and plays an effective role in improving email classification. The dataset was prepared and dynamically classified into 3 categories namely spam-ham, spam-phishing, and ham-phishing. Different methods for effective classification were performed such as data preprocessing, feature selection, model training, model testing, and classification result and performance evaluation. There were 5 machine learning algorithms used, and the result was evaluated using 8 performance indexes. The result shows that the XGBoost classifier out-performed other machine learning algorithms used, Results show that XGBoost machine learning algorithms outperformed other algorithms using the datasets. This research would help to improve categorizing emails into different folders based on their content, intent, or relevance, improve user experience, and better manage email inboxes by automatically filtering, sorting, and prioritizing messages.
AB - Spam and Phishing emails are the most crucial in social networks, many issues arise through emails such as cost of dealing with spam and phishing emails due to their large quantities, privacy resulting in loss of sensitive information, time taken to identify spam and phishing emails, and cyber security threat due to malicious content. Using a spam and phishing detection approach, a model can quickly recognize spam and phishing emails and classify them before they become a threat to the organization. In this study, a machine learning and Natural Language processing-based supervised learning approach was used and plays an effective role in improving email classification. The dataset was prepared and dynamically classified into 3 categories namely spam-ham, spam-phishing, and ham-phishing. Different methods for effective classification were performed such as data preprocessing, feature selection, model training, model testing, and classification result and performance evaluation. There were 5 machine learning algorithms used, and the result was evaluated using 8 performance indexes. The result shows that the XGBoost classifier out-performed other machine learning algorithms used, Results show that XGBoost machine learning algorithms outperformed other algorithms using the datasets. This research would help to improve categorizing emails into different folders based on their content, intent, or relevance, improve user experience, and better manage email inboxes by automatically filtering, sorting, and prioritizing messages.
KW - Classification
KW - NLP
KW - Phishing
KW - Spam
UR - http://www.scopus.com/inward/record.url?scp=85218504981&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-81455-6_13
DO - 10.1007/978-3-031-81455-6_13
M3 - Conference article
AN - SCOPUS:85218504981
SN - 9783031814549
T3 - Communications in Computer and Information Science
SP - 212
EP - 236
BT - Optimization and Data Science in Industrial Engineering - First International Conference, ODSIE 2023, Proceedings
A2 - Mirzazadeh, A.
A2 - Molamohamadi, Zohreh
A2 - Babaee Tirkolaee, Efran
A2 - Weber, Gerhard-Wilhelm
A2 - Leung, Janny
PB - Springer Science and Business Media Deutschland GmbH
T2 - 1st International Conference on Optimization and Data Science in Industrial Engineering, ODSIE 2023
Y2 - 16 November 2023 through 17 November 2023
ER -