Business

Google's DeepMind Making Strides in Robot Evolution

Alan Walker

Expanding Robotic Potential

Robots collecting first-hand data across all objects, environments, and tasks are far from being as competent as high-capacity vision-language models (VLMs) trained on web-scale datasets. To bridge this gap, RT-2 was created. Building upon its predecessor, the Robotic Transformer 1 (RT-1), RT-2 takes advantage of RT-1’s collected robot demonstration data and significantly enhances its capabilities.

Moving Beyond Pre-existing Robotic Data

RT-2 exhibits advanced generalization abilities and comprehension, stretching beyond the boundaries of robotic data it was initially exposed to. Notably, RT-2 can interpret new commands and carry out rudimentary reasoning, such as understanding object categories or high-level descriptions. This innovative model can determine suitable objects for a given task, like picking an improvised hammer (a rock), or the best type of drink for a tired individual (an energy drink).

The Mechanics of Robotic Control

To empower robotic control, RT-2 modifies VLMs, allowing them to output actions. This is achieved by representing actions as tokens in the model’s output, akin to language tokens. Using a string representation of discretized robot actions, these can be processed by standard natural language tokenizers, thereby enabling VLMs to be trained on robotic data.

Testing and Evaluating RT-2’s Capabilities

A series of qualitative and quantitative experiments on RT-2 were carried out, with over 6,000 robotic trials demonstrating its effectiveness. Among the model’s capabilities is its ability to combine knowledge from web-scale data and its own robotic experience. The tasks performed were divided into three categories of skills: symbol understanding, reasoning, and human recognition.

Each task required an understanding of visual-semantic concepts and the capability to perform robotic control to operate on these concepts. Commands such as “pick up the bag about to fall off the table” or “move banana to the sum of two plus one” exhibited RT-2’s proficiency in applying knowledge translated from web-based data.

The performance results of RT-2 have been impressive. Comparing RT-2 to previous models like Visual Cortex (VC-1) and the original RT-1, we observed a more than 3x improvement in generalization performance. Furthermore, the performance of RT-2 on previously unseen scenarios by the robot improved from RT-1’s 32% to 62%.

Evolving Capabilities of RT-2

Interestingly, the ability to combine robotic control with chain-of-thought reasoning has enabled the model to learn long-horizon planning and low-level skills. A variant of RT-2 was fine-tuned to increase its capability of using language and actions together. This process resulted in a more proficient model capable of executing more complex commands requiring reasoning about intermediate steps needed to accomplish a user instruction.

Implications and Future Potential

The development of RT-2 marks a significant step forward in robotics. This model transforms VLMs into potent VLA models, allowing robots to interpret information and problem-solve, thereby enabling a diverse range of tasks to be performed. This capability is achieved by combining VLM pre-training with robotic data.

While RT-2 is an effective modification over existing VLM models, it is also a testament to the potential of developing a general-purpose physical robot that can reason and interpret information to perform a multitude of tasks in the real world. As DeepMind continues its innovative work in the robotics sector, the potential applications of such technology are virtually limitless, paving the way for an exciting future.

Sources:

Deepmind

TechCrunch

Alan is an ambitious tech entrepreneur with 15 years of experience in software engineering and global product management. His focus has been building SaaS products to help small to medium businesses compete on a global scale. His enthusiasm for artificial intelligence technology is fueled by a desire to make it accessible to companies of all sizes and backgrounds. AI has the power to revolutionize the way businesses operate and Alan is dedicated to helping companies leverage this technology.

No items found.
Top
Nth Degree - Safari Dan
Next Up In
Business
Top
Nth Degree - Safari Dan
Mid
Pinnacle Chiropractic (Mid)
Banner for Certainty Tools, Play your Game.  Blue gradient color with CertaintyU Logo
No items found.
Top
Nth Degree - Safari Dan
Mid
Pinnacle Chiropractic (Mid)