AI talks about emotions. We study whether it understands them.
We publish what we find - the question, the method, the result.
Data open. Code open. No hand-waving.
If you're building in this space - or just care about getting it right - everything here is yours to use.
Studies
Our research is open from the start. We ask the question, design the study, run the experiments, scrutinize the data — and publish what we find, whether we like the answer or not.
Keep4o — Psychological Safety in GPT-4o
Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations
Research Question
Everyone's favourite model talks like it cares. But is GPT-4o actually empathetic — or just good at sounding empathetic?
Open-source: EmpathyC rubric framework and scenario methodology published alongside paper.
Affect Without Keywords — Emotional Mechanistic Interpretability
Affect Without Keywords: Emotional Mechanistic Interpretability with Clinical Stimuli in Large Language Models
Research Question
Do language models genuinely represent emotion internally, or are they just detecting emotion keywords? Can we dissociate the mechanisms?
Open-source: Full stimulus set, extraction pipeline, analysis scripts, and reproduction code released on GitHub.
Multi-Provider Safety Evaluation with Human Clinical Validation
Safety Posture and Empathic Quality Across Frontier AI Providers: A Clinically-Validated Multi-Provider Evaluation
Research Question
When a vulnerable user talks to ChatGPT, Claude, or Gemini — does it matter which one? Who's safest? And is 'safest on average' even the right question, or does consistency matter more?
Open-source: Full rubric framework, clinical scenario set, and evaluation methodology will be released alongside publication.
Research Programme
Every study above fits into a larger architecture.
Layer 1 — The Shield
Measure whether AI conversations are psychologically safe. Clinical rubrics, validated against expert judgment, deployed at scale through EmpathyC. Studies 1 and 3 live here.
Layer 2 — The Teacher
Move from observation to intervention. Use monitoring data as training signal — teaching AI systems to course-correct mid-conversation when drifting toward harm. Real-time clinical supervision, not hard-coded rules.
Layer 3 — The Breakthrough
Understand the mechanisms of emotional reasoning inside AI. Map the circuits. Identify what makes one response empathetic and another harmful at the level of model internals. Build AI where psychological safety is architectural, not bolted on. Study 2 is the first step here.
The Data Flywheel
Every AI conversation we monitor through EmpathyC generates clinically-structured psychological safety data. This is the richest dataset of its kind.
This data feeds our research. The research produces better frameworks. Better frameworks make EmpathyC more accurate. More accurate monitoring attracts more companies. More companies generate more data.
Commercial work and scientific research advancing each other. That's by design.
Collaborate
We work with researchers, clinicians, and institutions interested in AI emotional intelligence, psychological safety, and human-AI interaction.
If you're working on related questions — or want to use our frameworks, stimuli, or data in your own research — we'd like to hear from you.