My team at the
Department of Informatics, King's College London, is looking to appoint a
two-year post-doc (associate or fellow, depending on experience) in
Technical AI Safety
The position is funded by the Open Philanthropy grant “Verifiably Robust Conformal Probes”. The project’s goal is to develop methods for latent probing (aka activation monitoring) of large language models (LLMs) that leverage certification and conformal prediction techniques to offer probabilistic and adversarial robustness guarantees. Applications include the detection of misaligned LLM intentions such as deception, harmfulness, jailbreaking, and power-seeking behaviours.
Deadline: 20 November 2025