Gan & Isola (2026). A beautiful reframing of post-training: instead of iteratively fine-tuning from a single point, view pretraining as having created a distribution where task-expert solutions are already densely packed. The authors show that in large well-pretrained models, randomly sampling and ensembling perturbations is competitive with PPO and GRPO. Challenges several deeply held assumptions about optimisation.

Comments on "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights"
Create a free account or sign in to join the discussion.
Sign in to join the conversation