Connect with us

Artificial Intelligence

Google’s AI goes multimodal
with Gemini 2.0

Alphabet has finally delivered the AI goods with a powerful new suite of tools, writes ARTHUR GOLDSTUCK.

In a Flash, quite literally, Google has leaped to the forefront of the generative AI race.

Gemini 2.0 Flash, announced by parent company Alphabet on Thursday, adds video, images, and audio to the text capabilities of Gemini.

And that was only one of the products launched under the banner of Project Lilo, a showcase of Google’s latest advancements in AI.

It also took a running jump at the hot new category in this field, agentic AI, with Project Mariner, an “agent” that can be instructed to navigate Chrome and, for example, come back with a structured set of information. Project Astra takes this further, with the ability to provide complex instructions or advice.

Alphabet announced a new version of its AI chip, Trillium, which quadruples the speed as its predecessor while being 67% more efficient.

Alphabet and Google CEO Sundar Pichai wrote in a blog post last Wednesday: “Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.”

He said Gemini had been built from the start to be multimodal, meaning it would understand information across text, video, images, audio and code.

“With new advances in multimodality – like native image and audio output – and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant.

“Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users. We’re also launching a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on your behalf. It’s available in Gemini Advanced today.”

He said AI Overviews quickly became one of Google’s most popular Search features ever, now reaching 1-billion people.

“As a next step, we’re bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews to tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries and coding.”

Google DeepMind CEO and Nobel prize winner Demis Hassabis, along with Google DeepMind CTO Koray Kavukcuoglu, added to the blog post that Gemini 2.0 Flash was “our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale”.

They wrote: “In addition to supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.”

While Gemini 2.0 Flash is available only as an experimental model to developers for now,  all Gemini users globally can access a “chat optimised version” of 2.0 Flash experimental by selecting it in the drop-down menu on both the desktop and mobile web versions. It will come to the Gemini mobile app soon. 

At almost the same time as the Google announcement, Apple made ChatGPT available directly on iPhone 15 Pro and newer phones. It also made Visual Intelligence, a kind of augmented reality overlay on scenes viewed through the phone camera, available on the iPhone 16 via iOS 18.2.

Meanwhile, ChatGPT creator OpenAI, finally released Sora, its AI video-creation video model. It said it was available in most parts of the world, excluding the EU or UK – probably for regulatory reasons. Users are required to have either a ChatGPT Plus or Pro subscription to generate their own videos, but the platform has difficulty recognising those with subscriptions.

* Arthur Goldstuck is CEO of World Wide Worx and editor-in-chief of Gadget.co.za. Follow him on Bluesky on @art2gee.bsky.social.

Subscribe to our free newsletter
To Top