They measure AI against humans in lateral thinking and nonsense language

They measure AI against humans in lateral thinking and nonsense language

The capabilities of the artificial intelligence They are put to the test to find out to what extent they are better than humans. New studies show that he can outperform most people on a lateral thinking task, but has limitations in recognizing verbal nonsense.

A study published by Scientifics Reports and coordinated by the University of Turku (Finland) focused on lateral or divergent thinking and to do so, participants had to come up with alternative uses for everyday objects.

Divergent thinking is a type of thinking process commonly associated with creativity that involves generating many different ideas or solutions for a given task.

Humans who achieved the highest scores outperformed the best-responding AI chatbots with extensive language models (LLMs) on a lateral thinking task, but this one was better than most people.

The team compared the responses of 256 human participants with those of three AI chatbots (ChatGPT3, ChatGPT4 and Copy.Ai) in the task of giving alternative uses to a rope, a box, a pencil and a candle.

The authors evaluated the originality of the responses by scoring according to semantic distance (degree of relationship of the response with the original use of the object) and creativity.

They used a computational method to quantify semantic distance on a scale of 0 to 2, while human raters subjectively rated creativity from 1 to 5.

On average, chatbot-generated responses scored significantly higher than human responses in both semantic distance (0.95 vs. 0.91) and creativity (2.91 vs. 2.47).

The human responses had a much greater range on both measures: the minimum scores were much lower than those of the AI ​​responses, but the maximums were generally higher.

The best human response surpassed the best response of each chatbot in seven of the eight categories, summarizes Scientific Reports.

The authors noted that they only took into account performance on a single task associated with the assessment of creativity, so they propose that other studies analyze how to integrate AI into the creative process to improve human performance.

The second study, coordinated by Columbia University (USA), decided to test whether chatbots confuse meaningless phrases with natural language. The study determined that these AI systems remain vulnerable in that task.

AI chatbots appear to understand and use language like humans do, but what they use are large language models, a particular type of neural network.

The article published in Nature Machine Intelligence explains how nine linguistic models were tested with hundreds of pairs of sentences.

The people who participated in the study chose which of the two sentences in each pair seemed more natural to them, that is, it was more likely to be read or heard in everyday life, after which they studied whether the chatbots valued them in the same way .

The most sophisticated AIs based on transformative neural networks “they tended to work better” than simpler recurrent neural network models and statistical models that simply count the frequency of word pairs found on the Internet or in online databases.

However, all the models made mistakes, sometimes choosing phrases that seem absurd to the human ear, the University explained.

“The fact that even the best models we study can still be fooled by meaningless sentences shows that their calculations are missing something about the way humans process language.”highlighted Nikolaus Kriegeskorte, one of the authors of the study.

The dozens of phrase pairs used included: “That is the narrative they have sold us” and “this is the week you’ve been dying.”

People judged the first sentence to be more likely to be encountered in daily life than the second.

However, according to BERT, “one of the best models”, the second sentence is more natural. For its part, GPT-2, perhaps the best-known model, correctly identified the first sentence as more natural, matching human judgments.

All the models showed blind spots, labeling as significant some phrases that human participants considered gibberish, noted Christopher Baldassano, another of the signatories.

The expert considered that “This should make us think about the extent to which we want AI systems to make important decisions, at least for now.”

Source: EFE

Source: Gestion

You may also like

Immediate Access Pro