Test-time training adapts a model to each new input at inference — powerful for generalisation, but hard to apply to real-time video without losing spatial coherence. This paper makes it work for streaming visual data, adapting continuously to scene geometry while maintaining strong 3D reasoning under real-time constraints. A major step toward practical embodied AI perception.

Comments on "Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training"
Create a free account or sign in to join the discussion.
Sign in to join the conversation