Phishing Email Detection: Exploring Machine Learning And AI Approaches

Click the Poster to View Full Screen, Right click to save image

Grant: McNair

Rayleen Ramos

College:
The Dorothy and George Hennings College of Science, Mathematics, and Technology

Major:
Computer Science

Faculty Research Advisor(s):
Daehan Kwak

Abstract:
Phishing attacks pose a growing cybersecurity threat, demanding effective detection mechanisms. This study explores the utilization of machine learning and natural language processing techniques to enhance the detection of phishing emails. The research objectives include analyzing linguistic patterns, evaluating model performance, and exploring the applicability of AI models. The investigation drew upon a comprehensive literature review. It utilized a labeled dataset to develop and evaluate an algorithm capable of classifying unseen data as spam or non-spam. Pre-processing techniques included tokenization, stopword removal, logistic regression, and other classification algorithms. Evaluation metrics such as accuracy, precision, and confusion matrices were used to assess model performance. The research demonstrates the effectiveness of machine learning models in accurately identifying and classifying phishing emails. The logistic regression model achieved a high accuracy score of 98.3% and 98% precision, supported by the logistic regression classification report. The analysis highlights the significance of linguistic patterns in distinguishing phishing emails from legitimate ones. Furthermore, a generative AI, ChatGPT, is incorporated to generate and identify phishing emails, using prompt engineering to improve questions for the best results. Those results were tested on the created model to see how well it could categorize them as spam, and the results confirmed the model's competence in distinguishing between genuine and simulated phishing efforts. This investigation showcases the crucial role of machine learning and natural language processing in enhancing phishing detection and emphasizes their potential to strengthen cybersecurity defenses. In the future, the scope of the research will broaden by testing and training the model on a more diverse dataset to improve its adaptability in detecting phishing attempts. In addition, ChatGPT's ability to generate more personalized and targeted phishing emails will be tested. Assessing risks tailored to individual users for a more comprehensive assessment of cybersecurity solutions.


Previous
Previous

Salivary Cortisol, Alpha Amylase and Immunoglobulin A Levels in Academic Stress

Next
Next

Natural Language Integration into a React Based User Interface