Postdoc position - Research Fellow in AI Evaluation
Key details:
Position for at least 21 months (starting early 2025); extensions
beyond that period likely depending on results and associated
projects.
Salary will be 42000 EUR (gross salary per year distributed into
12+2 pays).
Job will be located at VRAIN (Technical University of Valencia),
one of the largest AI research centres in Europe.
Working environment:
This is an exciting opportunity to join a leading research team in
AI evaluation, enjoying an international atmosphere with a network
of collaborations world-wide and playing a major role steering the
community (e.g.,
aievaluation.substack.com) with recent
publications in top venues such as Nature, Science, NeurIPS, AAAI
and IJCAI. The candidate will work alongside the DMIP team at
VRAIN, under the direction of Jose Hernandez-Orallo
(
josephorallo.webs.upv.es).
Research details:
- The low predictability of AI systems is a challenge that has
been intensified by the widespread use of general-purpose AI,
now directly used for thousands of tasks by millions of
people, in an unprecedented upturn in AI penetration. Yet
users cannot anticipate whether the system is going to be
valid for a new task instance. This is an urgent need that
requires a better evaluation of the capabilities and safety
of AI systems.
- In this context, the research fellow will perform
world-class research to understand the space of AI
predictability, its metrics, and evaluation, collecting
system, task and possibly user feedback data to train validity
predictors. These validity predictors anticipate a metric of
performance or safety given the context of use <system,
instance, user>. The main goal is to develop theoretical
and experimental research to conceptualise this space, and
build monitors that supervise AI systems.
- For the notion of Predictable AI, we suggest interested
candidates to have a look at arxiv.org/pdf/2310.06167.
More research context:
Other relevant papers that illustrate the kind of research we do:
- Evaluating General-Purpose AI
(ebooks.iospress.nl/doi/10.3233/FAIA240459). It focuses on how
to characterise GPAI, but it contains definitions of
capability, generality, subject characteristic curves, etc.
- Measurement layouts (arxiv.org/pdf/2309.11975): this is a
specific approach using MCMC to model the behaviour of
underlying agents (it's in a 3D world, but we have developed
similar evaluation models for LLMs).
- This is more on methodology
(science.org/doi/full/10.1126/science.adf6369), but it
emphasises why instance-level evaluation is crucial.
- How psychometrics should be used in AI
(arxiv.org/pdf/2310.16379), talks about predictive and
explanatory power in evaluation, and validity.
- This recent Nature paper introduces some methodological
innovations into AI evaluation (e.g., three-valued analysis),
summarised here:
aievaluation.substack.com/p/2024-september-ai-evaluation-digest.
Key responsibilities:
RESEARCH (proper): Research literature reviews, brainstorming
research ideas, experimental research (testing AI systems,
analysing results, visualisation), problem formulation
(formalisation), writing papers and reports, etc.
RESEARCH (related): Dissemination of results by giving talks,
attending/organising conferences, developing collaborations,
contributing to further funding applications, etc.
Requirements:
- A PhD in a relevant field (Computer Science / Cognitive
Sciences / Psychology / Engineering / Maths / Physics / ...)
is essential.
- Fluency in English is essential.
- The ability to demonstrate potential for research and
publication at the highest level is essential.
- A strong interest in engagement in international and
cross-disciplinary teams (AI, cognitive science, etc.) is
essential.
- An interest in AI evaluation and Predictable AI and other
areas of the project (including those outside the candidate's
own areas of expertise) is essential.
- Strong expertise in machine learning is desirable, but some
basic knowledge is essential.
- Expertise in large language models is desirable.
- Expertise in AI evaluation is an advantage.
- Expertise in AI safety is an advantage.
Application:
- Candidates will send an email to strategy.pr...@gmail.com
with subject starting with "[PredAIT]", attaching exactly one
PDF file that includes their CV, cover letter (no more than
1000 words) with ideas or reflections on the topics relevant
to this position (see above) and the names and contact details
of two referees who are familiar with the candidate's work.
- Deadline: 20th November 2024, AoE.
- Candidates will be required to have a PhD that has been
awarded or recognised in the EHEA space (ehea.info/) and can
obtain the right to work in the EU (the university will help
with the info for the visa if needed).
Equality:
We celebrate diversity and encourage applications from all
qualified candidates, particularly those from under-represented
groups. We are committed to creating an inclusive and supportive
environment for all.
https://josephorallo.webs.upv.es/positions/postdoc2025-2026.html