Upcoming Events

📍 Location: Maria-von-Linden-Str. 1, 4th floor meeting room (A-401)
🕑 Time: Mondays at 18:00
💬 Open to Everyone – feel free to join any meetup that sounds interesting to you.


19.01.2026[Paper Discussion]🫘 Spilling the beans – Teaching LLMs to Self-Report Their Hidden Objectives and its predecessor paper
💬 Teaching Models to Verbalize Reward
Hacking in Chain-of-Thought Reasoning

Note on Background Knowledge: While we don’t require you to know a lot about AI Safety, our current events are targeted towards people who are already familiar with the basics of Machine Learning / AI Safety (e.g., having taken an AI Safety Fundamentals course by BlueDot or read about it). At the start of the next semester, we will have more beginner-friendly events as well.

Get Involved

Want to present a paper, lead a discussion, or simply learn more about AI safety? We’d love to hear from you.
👉 Let us know what topics/papers you want to discuss or sign up to present you research.
👉 Join our WhatsApp group for updates on upcoming events.
👉 Check out our resources for an introduction to AI safety topics.
👉 New to AI Safety or Tübingen? Reach out if you’d like to schedule a one-on-one meeting with one of our organizers. We invite you for a coffee and can give you targeted advice! ☕️

Past Events

12.01.2026
[Paper Discussion] Activation Orcales: Training and Evaluating LLM as General-Purpose Activation Explainers
15.12.2025Intro to Scalable Oversight by Ameya Prabhu (Bethgelab)
01.12.2025Quick introduction + discussing Anthropic’s recent paper on “Natural Emergent Misalignment in Production RL”