Current - Issue

Original Article

AI-Driven Phishing Detection Using Natural Language Processing and Machine Learning

Sarveena S1 Dhanusiya R2 A. Raja3
1 2 Department of Computer Science and Engineering (Cyber Security), United Institute of Technology, Coimbatore, Tamil Nadu, India. 3 Head of the Department, Department of Computer Science and Engineering (Cyber Security), United Institute of Technology, Coimbatore, Tamil Nadu, India.

Published Online: May-June 2026

Pages: 120-128

Abstract

Phishing attacks represent one of the most persistent and damaging cybersecurity threats in the modern digital landscape, systematically exploiting human cognitive vulnerabilities to illicitly obtain sensitive information including login credentials, financial account data, and personal identity details. Conventional rule-based and blacklist-driven detection systems have demonstrated a pronounced inability to adapt to the rapidly evolving sophistication of contemporary phishing techniques, resulting in elevated false-positive rates, significant missed detections, and an ongoing reliance on labour-intensive manual maintenance. This paper presents a comprehensive AI-driven phishing detection framework that systematically integrates Natural Language Processing (NLP) and Machine Learning (ML) methodologies to substantially enhance both detection accuracy and operational robustness. The proposed system incorporates multi-stage text preprocessing, hybrid feature extraction combining Term Frequency–Inverse Document Frequency (TF-IDF) vectorisation and pre-trained word embeddings including Word2Vec and GloVe, alongside a comparative evaluation of supervised classification models encompassing Logistic Regression, Support Vector Machines (SVM), Random Forest, and Long Short-Term Memory (LSTM) deep learning networks. Experimental evaluation conducted across a combined dataset of 129,382 labelled email samples demonstrates that the proposed hybrid NLP-ML model substantially outperforms both traditional rule-based approaches and single-method ML baselines, with the LSTM classifier achieving 96.7% accuracy, 96.3% precision, 96.0% recall, and an F1-score of 96.1%. The principal contributions of this work include a rigorous comparative analysis of six machine learning architectures, a scalable and modular detection pipeline suitable for real-time deployment, a comprehensive feature importance analysis identifying key discriminative attributes, and actionable insights for enhancing operational phishing detection systems.

Related Articles

2026

AI-Based Stomach Cancer Detection Using Biomarkers, Medical Images, and Voice Analysis

2026

Hydrogen-Efficient Eco-Driving and Route Planning for Fuel-Cell Electric Vehicles Using Multi-Objective Optimization Under Traffic and Terrain Uncertainty

2026

A Data-Driven Machine Learning Framework for Assessing Patent Commercial Value and Technological Significance

2026

Evaluating Student Academic Performance Through a Benchmark of Fuzzy Reasoning Models

2026

A Hybrid Soft Computing Approach for Managing Uncertainty in Data Analytics

2026

Soft Computing Approaches for Robust Analysis of Imbalanced and Noisy Data

Share Article

X
LinkedIn
Facebook
WhatsApp

Or copy link

https://test.theijire.com/archives/10.59256/ijire.20260703013

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.