Google I/O 2024: DeepMind Showcases Real-Time AI Interaction Based on Computer Vision with Project Astra

The Google I/O 2024 keynote session allowed the company to showcase its impressive lineup of artificial intelligence (AI) models and tools that it has been working on for some time. Most of the featured features will reach public previews in the coming months. However, the most interesting technology on display at the event won’t be here for a while. Developed by Google DeepMind, this new AI assistant was called Project Astra and demonstrated real-time computer vision-based AI interaction.

Project Astra is an AI model that can perform tasks that are extremely advanced for existing chatbots. Google follows a system where it uses its biggest and most powerful AI models to train its production-ready models. Highlighting one such example of an AI model currently in training, Google DeepMind co-founder and CEO Demis Hassabis presented the Astra project. Introducing it, he said: “Today we have some exciting new developments to share about the future of AI assistants that we call Project Astra. We have long wanted to build a universal AI agent that can truly be helpful in everyday life.”

Hassabis also outlined a number of requirements the company has set for such AI agents. They must understand and respond to the complex and dynamic environment of the real world and must remember what they see in order to develop context and take action. Furthermore, it must also be teachable and personable so that it can learn new skills and hold conversations without delay.

Along with that description, DeepMind’s CEO showed a demo video showing a user holding a smartphone with the camera app open. The user speaks to the AI, and the AI instantly responds, answering various sight-based queries. The AI was also able to use visual information for context and answer related questions that required generative capabilities. For example, a user showed the AI some crayons and asked the AI to describe them with alliteration. Without any pause, the chatbot says, “Creative crayons color happily. They certainly make colorful creations.”

But that was not all. Later in the video, the user points to a window from which some buildings and roads can be seen. When asked about the neighborhood, the AI immediately gives the correct answer. This demonstrates the computer vision processing capability of an AI model and the vast visual data set that would be required to train it. But perhaps the most interesting demonstration was when the AI was asked about the user’s glasses. They briefly appeared on the screen for a few seconds and were already gone from the screen. However, the AI was able to remember its location and guide the user to it.

The Astra project is not available in either public or private view. Google is still working on the model and needs to come up with use cases for the AI feature and decide how to make it available to users. This demo would be the funniest AI endeavor yet, but OpenAI’s Spring Update event a day ago took some of that away. During its event, OpenAI showcased GPT-4o which showed off similar capabilities and emotive voices that made AI sound more human.

Google I/O 2024: DeepMind Showcases Real-Time AI Interaction Based on Computer Vision with Project Astra

Comments

Leave a Reply Cancel reply