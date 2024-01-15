Researchers in the United States are working on an artificial intelligence agent that will simplify complex online tasks, writes ARTHUR GOLDSTUCK.

Most of us take Internet use for granted, once we have access. The World Wide Web is so intuitive, it appears, that anyone can simply call it up and begin clicking, clacking and tinkering.

In truth, that applies only to the privileged who are both experienced users and with full use of their physical faculties. For people with disabilities, a simple website can be an obstacle course. The early standards for the Web included the “alt” tag in the HTML code that produces a Web page, allowing text-to-speech systems to read the description of an image to the visually challenged. But beyond that, little thought has gone into greater accessibility, despite the fact that one of the aims of the World Wide Web Consortium (W3C), which develops standards and guidelines, is a Web based on the principles of accessibility.

The good news is that dramatic advances in natural language processing (NLP) and artificial intelligence (AI) have the potential to transform accessibility of the Internet in all its forms, from apps to the Web.

Last month, at the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), a conference for AI and machine learning, researchers from the Ohio State University presented a study io how an AI agent could complete complex tasks on any website, using simple language commands.

According to Yu Su, co-author of the study and an assistant professor of computer science and engineering at Ohio State, in the three decades since the Web was first released into the public domain, it has become an incredibly intricate, dynamic system. While there are billions of websites available to help access information or communicate with others, many tasks on the internet can take more than a dozen steps to complete.

Su said the study, which uses information taken from live sites to create web agents — or online AI helpers — was a step toward making the digital world a less confusing place.

“For some people, especially those with disabilities, it’s not easy for them to browse the internet,” he said. “We rely more and more on the computing world in our daily life and work, but there are increasingly a lot of barriers to that access, which, to some degree, widens the disparity.”

Generative AI, such as ChatGPT, Google Bard, Anthropic Claude and Microsoft Bing AI, all of which use large language models (LLMs), has the potential to close the gap.

By taking advantage of the power of LLMs, said Su, the agent works similarly to how humans behave when browsing the web. The Ohio State team showed that their model was able to understand the layout and functionality of different websites using only its ability to process and predict language.

According to a statement by Ohio State,Researchers started the process by creating Mind2Web, the first dataset for generalist web agents.

“Though previous efforts to build web agents focused on toy simulated websites, Mind2Web fully embraces the complex and dynamic nature of real-world websites and emphasises an agent’s ability of generalising to entirely new websites it has never seen before,” it said.

According to Su, much of their success is due to their agent’s ability to handle the internet’s ever-evolving learning curve. The team lifted over 2,000 open-ended tasks from 137 different real-world websites, which they then used to train the agent.

The exercises were fascinating tests of AI agents’ skills:

“Some of the tasks included booking one-way and round-trip international flights, following celebrity accounts on Twitter, browsing comedy films from 1992 to 2017 streaming on Netflix, and even scheduling car knowledge tests at the DMV. Many of the tasks were very complex – for example, booking one of the international flights used in the model would take 14 actions.”

Su said such effortless versatility allows for diverse coverage on a number of websites, and opens up a new landscape for future models to explore and learn in an autonomous fashion.

“It’s only become possible to do something like this because of the recent development of LLMs like ChatGPT,” said Su.

Because one website could contain thousands of raw HTML elements, it would be too costly to feed so much information to a single large language model. To address this gap, the study introduced a framework called MindAct, a two-pronged agent that uses both small and large language models to carry out tasks. The team found that, by using this strategy, MindAct significantly outperformed other common modeling strategies.

The work also highlighted an ethical problem in creating flexibleAI, said Su.

“On the one hand, we have great potential to improve our efficiency and to allow us to focus on the most creative part of our work. But on the other hand, there’s tremendous potential for harm.”

For example, he said, autonomous agents able to translate online steps into the real world could influence society by taking potentially dangerous actions, such as misusing financial information or spreading misinformation.

“We should be extremely cautious about these factors and make a concerted effort to try to mitigate them. “

However, the positives seemed to outweigh the negatives for Su: “Throughout my career, my goal has always been trying to bridge the gap between human users and the computing world. That said, the real value of this tool is that it will really save people time and make the impossible possible.”

* Arthur Goldstuck is founder of World Wide Worx and editor-in-chief of Gadget.co.za. He is author of 20 books, including “The Hitchhiker’s Guide to AI”, published by Pan Macmillan. Follow him on Twitter and Instagram on @art2gee.