Google has been lagging in the race for artificial intelligence (AI) leadership, ever since OpenAI launched ChatGPT and Microsoft boarded that train as it accelerated.
However, a series of announcements last week saw Google ramp up both its consumer-facing and enterprise-oriented AI offerings.
The new feature that will capture the most attention is likely to be Bard image generation, built into the Google Bard generative AI platform. In other words, just a click away from Google search. This new capability is powered by Google’s updated Imagen 2 model, which is designed to balance quality and speed, delivering high-quality, photorealistic outputs.
“As we kick off 2024, we’re launching our most colorful feature yet: image generation,” Google breathlessly told Bard users in an email on Thursday. “So whether you want to mock up a futuristic car or just need an image of a Yeti wearing sunglasses—if you can think it, Bard can create it.
“Create totally unique images with Bard. Enter a few words to bring your imagination to life, generate more options, and download the ones you like.”
It gave three sample prompts to provide a sense of the range of content that could be imagined:
- Create an image of a majestic oil painting of a dinosaur king and queen wearing red French royal gowns.
- Create an image of an alien octopus floating through a portal reading a newspaper.
- Create a picture of puppies in a basket.
These were the results:
Our own effort resulted in the photo-realistic image of a cat, above this story.
Google also provided a more detailed guide to what it called “Prompt Engineering 101: Paint a Picture with Words”.
It provided a more detailed, nuanced prompt to give a sense of the detail that could be added to an image: “Generate a photorealistic image of an adorable hedgehog, its fur neatly combed, riding a rocket ship, zooming across a vibrant sky full of stars, leaving a trail of shimmering stardust behind.”
“It helps to be more descriptive, but you can start image generation with just a few words,” said Andrew Goodman, product manager on Bard’s image generation. “When a user enters a prompt to generate images, Bard actually expands on the user’s prompt with additional descriptions to create more detailed images. If you want to see more options, you can always edit the prompt yourself or generate more options.”
He also spelled out how Bard uses digitally identifiable watermarks , embedded into the pixels of generated images. to identify images generated by AI.
“Using a technology called SynthID, all unique images generated on Bard will have embedded watermarking to indicate that it was created by AI. The watermark is directly added into the pixels of an AI-generated image, meaning it’s imperceptible to the human eye but it can still be detectable with SynthID. It’s important that we approach the creation of images with AI responsibly.”
The tool incorporates technical “guardrails” that seek to limit violent, offensive or sexually explicit content. It also applies filters designed to avoid the generation of images of named people.
Google also on Thursday announced the global availability of Gemini Pro, an enterprise level chatbot, that gives Bard more advanced understanding, reasoning, summarising and coding abilities. First launched in the USA December 2024, Gemini Pro in Bard is now available in over 40 languages and more than 230 countries and territories.
A fundamental advantage of the new Bard over ChatGPT, is its ability to provide an easy way to double-check facts it generates. This makes it far more reliable than CgatGPT, but also more trustworthy.
South Africans are by nature deeply suspicious, having been conditioned by government misinformation to distrust information from single official sources. Having a generative AI platform they can trust will increase their confidence in using AI, and will accelerate the takeup of such tools in South Africa.
Google announced: “Since we know people want the ability to corroborate Bard’s responses, we’re also expanding our double-check feature, which is already used by millions of people in English, to more than 40 languages. When you click on the ‘G’ icon, Bard will evaluate whether there is content across the web to substantiate its response. If it can be evaluated, you can click the highlighted phrases and learn more about supporting or contradicting information found by Search.”
It is also going to have a massive impact on content production, and we are likely to see professionals and hobbyists alike pumping out far more content than was possible ever before, possibly at higher quality than was possible with limited resources. Many South Africans, in a market where jobs and means of earning extra income are so limited, will turn to it for both full-time work and side-hustles.