Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Standard LLM fine-tuning teaches word by word, creating a feedback loop where the model depends on its own earlier mistakes — the exposure bias problem. This paper replaces token-level supervision with energy-based training over feature representations, cleanly breaking that loop. Strong empirical results with elegant theoretical grounding from a team spanning MIT, Microsoft Research, and Harvard.