Serious Software
Want to throw off chatbots? Use figurative language
Researchers using a simple script that replaces figurative phrases with their literal meaning improved chatbot performance by up to 15%.
Computer scientists recently examined the performance of dialogue systems, such as personal assistants and chatbots designed to interact with humans. The team found that when these systems are confronted with dialogue that includes idioms or similes, their performance drops to between 10% and 20%.
The research team also developed a partial remedy. They wrote a simple script that identifies figurative phrases and replaces those with their literal meaning. As a result, the performance of dialogue systems improved by up to 15%.
The researchers are presenting their findings at the 2021 Conference on Empirical Methods in Natural Language Processing, which takes place from 7 to 11 November.
Applications for this work include not only personal assistants, but also systems that are designed to summarize information, such as the box summarizing search results at the top of a Google page. Automated systems that need to answer questions, for example when a bill needs to be paid or an appointment to be made, would also benefit from this work.
“We want to enable more natural conversations between people and dialogue systems,” says Harsh Jhamtani, the paper’s first author.
Jhamtani is a Ph.D. student at Carnegie Mellon University and is currently working as a visiting researcher with senior author Taylor Berg-Kirkpatrick, a faculty member in the UC San Diego Department of Computer Science and Engineering.
The study was inspired by Jhamtani’s own struggles with figurative language. He is a native Hindi speaker and also speaks English, India’s other official language. But he had to learn the many U.S. idioms and metaphors his colleagues use.
For example, he panicked when a colleague said they were starving because in Hindi that might indicate a medical emergency. His colleague then explained it just meant he was hungry. By then Jhamtani was wondering if artificial dialogue systems would have the same issue he did.
In the study, researchers tested five different systems designed to talk with humans, including GPT-2, which is trained to predict the next word in 40GB of Internet text and was developed by research company OpenAI.
Researchers first ran the dialogue systems through a dataset of 13.1K conversations on colloquial topics like tourism, health and so on. They then extracted the conversations that included figurative language from the dataset and ran the systems through those only. They observed a drop in performance ranging from 10% to 20%.
They then wrote a script that allowed the systems to quickly check dictionaries that translate figurative speech into literal speech. This is faster and more efficient than re-training systems to learn the complete content of these dictionaries. Researchers observed that performance improved by as much as 15%.
The researchers still had to partially rely on human observers to identify figurative language within the dataset, before the text could be converted. Further study is needed in this area.
It will take several iterations before the algorithms the researchers developed will be ready for implementation. For example, they found that in some rare cases, replacing the figurative language with literal language distorted the grammar of a sentence to the point where the dialogue systems could no longer understand.