Navigating Adversarial Prompts to Secure Large Language Models
College:
The Dorothy and George Hennings College of Science, Mathematics, and Technology
Major:
Computer Science
Faculty Research Advisor(s):
Yulia Kumar
Abstract:
The present investigation rigorously explores the resilience of state-of-the-art artificial intelligence (AI) Large Language Models (LLMs), such as ChatGPT, Microsoft Copilot, and Gemini, as well as AI-driven image generators like DALL-E 3, against adversarial prompts. These advanced models are susceptible to inadvertent or intentional manipulation, leading to the generation of responses or images that contravene the ethical and security guidelines established by their developers. The study employs advanced prompt-engineering and 'jailbreaking' techniques to uncover subtle yet significant vulnerabilities, thereby presenting an innovative methodology for robust testing of AI systems. This approach not only highlights the critical necessity for enhanced AI defenses but also sheds light on the complex interplay between AI innovation and ethical integrity.
At the heart of these findings is a call for proactive and ongoing enhancement of AI technologies to ensure their security. By identifying current shortcomings and vulnerabilities, this research contributes significantly to the wider discourse on responsible AI utilization. It emphasizes the need for developing robust ethical frameworks and advanced security protocols. The researchers propose practical strategies to fortify AI models against adversarial threats, with the goal of establishing a digital ecosystem where ethical compliance and digital security are paramount. Future directions for this research include refining these models further and incorporating new data modalities such as voice and video.