Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Test-time training adapts a model to each new input at inference — powerful for generalisation, but hard to apply to real-time video without losing spatial coherence. This paper makes it work for streaming visual data, adapting continuously to scene geometry while maintaining strong 3D reasoning under real-time constraints. A major step toward practical embodied AI perception.