AI understands many things... except for human social interactions

featured-image

Artificial intelligence continues to advance, yet this technology still struggles to grasp the complexity of human interactions. A recent American study reveals that, while AI excels at recognizing objects or faces in still images, it remains ineffective at describing and interpreting social interactions in a moving scene. The team led by Leyla Isik, professor of

Artificial intelligence continues to advance, yet this technology still struggles to grasp the complexity of human interactions. A recent American study reveals that, while AI excels at recognizing objects or faces in still images, it remains ineffective at describing and interpreting social interactions in a moving scene. The team led by Leyla Isik, professor of cognitive science at Johns Hopkins University, investigated how artificial intelligence models understand social interactions.

To do this, the researchers designed a large-scale experiment involving over 350 AI models specializing in video, image, or language. These AI tools were exposed to short, three-second video sequences illustrating various social situations. At the same time, human participants were asked to rate the intensity of the interactions observed, according to several criteria, on a scale of 1 to 5.



The aim was to compare human and AI interpretations to identify differences in perception and better understand the current limits of algorithms in analyzing our social behaviors. The human participants were remarkably consistent in their assessments, demonstrating a detailed and shared understanding of social interactions. AI, on the other hand, struggled to match these judgments.

Models specializing in video proved particularly ineffective at accurately describing the scenes observed. Even models based on still images, although fed several extracts from each video, struggled to determine whether the characters were communicating with each other. As for language models, they fared a little better, especially when given descriptions written by humans, but remained far from the level of performance of human observers.

For Leyla Isik, the inability of artificial intelligence models to understand human social dynamics is a major obstacle to their integration into real-world environments. “AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians. You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street,” the study’s lead author explains in a news release.

“Any time you want an AI to interact with humans, you want it to be able to recognize what people are doing. I think this [study] sheds light on the fact that these systems can’t right now.” According to the researchers, this deficiency could be explained by the way in which AI neural networks are designed.

These are mainly inspired by the regions of the human brain that process static images, whereas dynamic social scenes call on other brain areas. This structural discrepancy could explain what the researchers describe as “a blind spot in AI model development.” Indeed, “real life isn’t static.

We need AI to understand the story that is unfolding in a scene,” says study coauthor Kathy Garcia. Ultimately, this study reveals a profound gap between the way humans and AI models perceive moving social scenes. Despite their computing power and ability to process vast quantities of data, machines are still unable to grasp the subtleties and implicit intentions underlying our social interactions.

Although artificial intelligence has made tremendous advances, it is still a long way from truly understanding exactly what goes on in human interactions. Subscribe to our daily newsletter By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy .

.