A new project, led by Monash University researchers, will develop an Artificial Intelligence (AI)-assisted application to help with real-time interpretations for diplomatic talks, international business and tourism.
The US$5 million project, funded by the US Department of Defense’s Defense Advanced Research Projects Agency (DARPA), will develop a smart phone-based assistive dialogue system using smart glasses that will apply machine learning, speech recognition and vision technology to provide cross-cultural communication assistance in real-time.
Project researchers from Monash University’s Vision and Language Group (VLG) at the Faculty of Information Technology (IT) said the goal of the program is to develop language processing technology that will recognise and adapt to the emotional, social, and cultural norms that differ across societies, languages, and communities.
“In addition to interpreting the content of the speech, the system will be ‘translating’ body language and facial expressions, providing cultural cues to prevent a breakdown in communications and ensuring smoother cross-cultural dialogue. During this project, we will be focussing mainly on negotiation-based dialogues,” the researchers explained.
During a conversation, the dialogue assistance system may notice an imminent communication breakdown by analysing audiovisual cues in real-time. The system can then send ‘notifications’ to the user’s smart glass providing appropriate culturally attuned prompts to secure the negotiation.
For instance, the system may prompt the user to rectify the negotiation by making the other party feel more comfortable. It may then notify different ways the user can increase the level of comfort such as addressing the other person more respectfully as per their specific cultural norms.
Faculty of IT Deputy Dean (Research) Professor Maria Garcia de la Banda welcomed the support for research that will lead to innovation in the use of AI and data science for dialogue assistance technologies.
“Current AI-enabled systems are not capable of accurately analysing the many nuances of human communication or of providing useful assistance beyond basic machine translation,” Professor Garcia de la Banda said.
“In this project our researchers will combine sophisticated speech technology with advanced multimedia analysis and cultural knowledge to build systems that provide a holistic solution.”
The study will be conducted over the next three years in two phases. The first prototype will be released by March 2023.
This research will be led by the VLG from the Faculty of IT at Monash University in collaboration with researchers from the David Nazarian College of Business and Economics at California State University, Northridge and the Department of Biostatistics & Health Informatics, the Institute of Psychiatry, Psychology & Neuroscience at King’s College London.