AI-Driven Phishing Detection Using Natural Language Processing and Machine Learning

Sarveena S; Dhanusiya R; A. Raja

doi:https://www.doi.org/10.59256/ijire.20260703013

Current - Issue

Original Article

AI-Driven Phishing Detection Using Natural Language Processing and Machine Learning

Sarveena S¹ Dhanusiya R² A. Raja³

¹ ² Department of Computer Science and Engineering (Cyber Security), United Institute of Technology, Coimbatore, Tamil Nadu, India. ³ Head of the Department, Department of Computer Science and Engineering (Cyber Security), United Institute of Technology, Coimbatore, Tamil Nadu, India.

Published Online: May-June 2026

Pages: 120-128

Cite this article

↗ https://www.doi.org/10.59256/ijire.20260703013

Abstract

View PDF

Phishing attacks represent one of the most persistent and damaging cybersecurity threats in the modern digital landscape, systematically exploiting human cognitive vulnerabilities to illicitly obtain sensitive information including login credentials, financial account data, and personal identity details. Conventional rule-based and blacklist-driven detection systems have demonstrated a pronounced inability to adapt to the rapidly evolving sophistication of contemporary phishing techniques, resulting in elevated false-positive rates, significant missed detections, and an ongoing reliance on labour-intensive manual maintenance. This paper presents a comprehensive AI-driven phishing detection framework that systematically integrates Natural Language Processing (NLP) and Machine Learning (ML) methodologies to substantially enhance both detection accuracy and operational robustness. The proposed system incorporates multi-stage text preprocessing, hybrid feature extraction combining Term Frequency–Inverse Document Frequency (TF-IDF) vectorisation and pre-trained word embeddings including Word2Vec and GloVe, alongside a comparative evaluation of supervised classification models encompassing Logistic Regression, Support Vector Machines (SVM), Random Forest, and Long Short-Term Memory (LSTM) deep learning networks. Experimental evaluation conducted across a combined dataset of 129,382 labelled email samples demonstrates that the proposed hybrid NLP-ML model substantially outperforms both traditional rule-based approaches and single-method ML baselines, with the LSTM classifier achieving 96.7% accuracy, 96.3% precision, 96.0% recall, and an F1-score of 96.1%. The principal contributions of this work include a rigorous comparative analysis of six machine learning architectures, a scalable and modular detection pipeline suitable for real-time deployment, a comprehensive feature importance analysis identifying key discriminative attributes, and actionable insights for enhancing operational phishing detection systems.

Quick Links

Download

Manuscript Template Copyright Form

Policies

Share Article

X

Facebook

Or copy link

https://test.theijire.com/archives/10.59256/ijire.20260703013

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.

Current - Issue

AI-Driven Phishing Detection Using Natural Language Processing and Machine Learning

Cite this article

Abstract

Related Articles

AI-Based Stomach Cancer Detection Using Biomarkers, Medical Images, and Voice Analysis

Hydrogen-Efficient Eco-Driving and Route Planning for Fuel-Cell Electric Vehicles Using Multi-Objective Optimization Under Traffic and Terrain Uncertainty

A Data-Driven Machine Learning Framework for Assessing Patent Commercial Value and Technological Significance

Evaluating Student Academic Performance Through a Benchmark of Fuzzy Reasoning Models

A Hybrid Soft Computing Approach for Managing Uncertainty in Data Analytics

Soft Computing Approaches for Robust Analysis of Imbalanced and Noisy Data

PlumX Metrics

Dimension

Quick Links

Download

Policies

Share Article