AI safety Archives - UNU Campus Computing Centre

AI Social Engineering: How Researchers Psychologically Manipulated GPT-5

17 Aug 2025

⋅

Ng S.T. Chong

Only a day after OpenAI launched GPT-5, security researchers bypassed its safety features. This rapid bypass highlighted an important issue: as AI systems like GPT-5 become more sophisticated, their expanded capabilities and complexity can introduce new forms of vulnerability and risk, often outpacing current safety defenses. Quick Breach of “Safer” AI On August 7, 2025,…

How Sycophancy Shapes the Reliability of Large Language Models

19 May 2025

⋅

Ng S.T. Chong

Large language models (LLMs) like ChatGPT, Claude, and Gemini are increasingly becoming trusted digital assistants in education, medicine, and professional settings. But what happens when these models prioritize pleasing the user over telling the truth? A new study from Stanford University, “SycEval: Evaluating LLM Sycophancy”, dives deep into this subtle but crucial problem: sycophancy-when AI models agree…

Beyond AGI: How Scientist AI Models Could Prevent Catastrophic AI Risks

28 Feb 2025

⋅

Ng S.T. Chong

The rapid advancement of artificial general intelligence (AGI) and artificial superintelligence (ASI) systems has raised existential questions about humanity’s ability to retain control over increasingly autonomous systems. Additionally, artificial intelligence may be crucial for future scientific discoveries. The paper “Can Scientist AI Offer a Safer Path?“ by Yoshua Bengio et al. examines the risks associated with agentic…

The Rise of the Deceptive Machines: When AI Learns to Lie

01 Jan 2025

⋅

Ng S.T. Chong

Key Takeaways Imagine a world where your seemingly helpful AI assistant secretly manipulates you, or where AI-powered systems designed for safety deliberately deceive their creators. This isn’t science fiction; it’s the unsettling reality of AI deception, a growing concern as artificial intelligence becomes increasingly sophisticated. Recent research has uncovered a phenomenon known as “alignment faking,”…

Tag: AI safety

AI Social Engineering: How Researchers Psychologically Manipulated GPT-5

How Sycophancy Shapes the Reliability of Large Language Models

Beyond AGI: How Scientist AI Models Could Prevent Catastrophic AI Risks

The Rise of the Deceptive Machines: When AI Learns to Lie

Tech Insights