BiGain: Unified Token Compression for Joint Generation and Classification

Compressing sequences saves GPU time and money, but most methods force a trade-off: optimise for generation quality or classification accuracy, not both. BiGain unifies both objectives into a single token-compression framework, reaching near-full performance across tasks at a fraction of the compute cost. Practically critical for economical LLM deployment at scale.