Evolution strategies at scale: A new way to fine-tune LLMs

34 views

Skip to first unread message

Risto Miikkulainen

unread,

Oct 8, 2025, 8:12:16 PM (10 days ago) Oct 8

to Reinforcement Learning Mailing List

For a while now, reinforcement learning has been the standard way to fine tune LLMs. It works in the action space, i.e. exploring alternative actions to maximize reward, which is difficult especially with long action sequences. If we could explore LLM parameters directly, it might be possible to find more systematic and creative changes.

Indeed parameter-space fine tuning is possible, and in a surprising way: evolution strategies, i.e. population-based search, can be scaled up to optimize billions of parameters in LLMs. It outperforms PPO and GRPO in a symbolic reasoning task, for example, while being more consistent and sample-efficient. Its gradient-free nature makes it a promising alternative in environments where RL is fragile or costly to apply.

Read the blog (with a video and a conceptual illustration):
https://www.cognizant.com/us/en/ai-lab/blog/llm-fine-tuning-with-es
Read the full paper:
https://arxiv.org/abs/2509.24372
Explore the code:
https://github.com/VsonicV/es-fine-tuning-paper

Reply all

Reply to author

Forward

0 new messages