Ethical safeguards and vulnerabilities in large language models

Jordan Stuckey, Hayden Wimmer, Carl M. Rebman

Research output: Contribution to journalArticlepeer-review

Abstract

This study examines the ethical safeguards and vulnerabilities of six large language models: Claude, Mistral, ChatGPT, Llama2, Perplexity, and Poe. Through targeted prompts exploring malware creation, phishing attempts, and social engineering scenarios, we evaluated their susceptibility to misuse and their ability to enforce ethical guidelines. With some jailbreaking attempts we see how manipulation could bypass safety mechanisms, showing potential for large language models to facilitate malicious actions or just gain more knowledge in general about it. From testing it seemed like Claude demonstrated the most resilience. The results show a continuous need to improve the safety features of AI as with enough time and effort, there are possibilities to use them for malicious intent. Urging AI developers and experts mitigate those risks by implementing stronger ethical safeguards to counteract the different manipulation techniques.

Original languageEnglish
Pages (from-to)319-333
Number of pages15
JournalIssues in Information Systems
Volume26
Issue number3
DOIs
StatePublished - 2025

Scopus Subject Areas

  • General Business, Management and Accounting

Keywords

  • ChatGPT
  • jailbreaking
  • LLMs
  • mistral
  • prompt
  • prompt engineering

Fingerprint

Dive into the research topics of 'Ethical safeguards and vulnerabilities in large language models'. Together they form a unique fingerprint.

Cite this