AEGIS addresses catastrophic forgetting when fine-tuning vision-language models for robotic control by preventing cross-modal gradient asymmetry—high-magnitude continuous action gradients overwriting the VLM's cross-entropy pre-trained manifold. Uses anchor-enforced gradient isolation to preserve VQA capabilities while injecting flow-matching action supervision, unlike stop-gradient or LoRA approaches.
Gemini Robotics-ER 1.6 specialized reasoning model for physical AI achieves 93% success on instrument reading tasks (up from 23% baseline) through agentic vision combining visual reasoning with code execution. It adds spatial reasoning, multi-view perception, and industrial gauge interpretation as a high-level planning layer for vision-language-action robotics models.
Google DeepMind released Gemini Robotics-ER 1.6, a robotics reasoning model with improved spatial reasoning, multi-view perception, instrument reading, and hazard detection (+6% text, +10% video safety). Available via Gemini API with Boston Dynamics deploying it for autonomous Spot robot operations.
Boston Dynamics integrated Gemini and Gemini Robotics-ER 1.6 into Spot's Orbit AIVI systems, enabling robots to perform complex reasoning about industrial environments, identify hazards, and read instruments. The Gemini-powered AIVI-Learning system is now live for existing customers as of April 15, 2026.
HiVLA decouples VLM semantic planning from motor control to preserve reasoning capabilities lost in end-to-end VLA fine-tuning. VLM planner generates subtask instructions with target bounding boxes, then flow-matching DiT translates grounded plans to physical actions for robotic manipulation.