Analyzis of Emotional Speech in Serbian from Surprisal Theory Perspective
The analysis of emotional speech has gained significant attention in the fields of s peech r ecognition a nd natural language processing. From emotion recognition to emotional text-to-speech synthesis, emotional speech plays a crucial role, particularly in areas such as human-computer interaction and intelligent robotics. However, this area remains underexplored. Recent research trends emphasize the use of multimodal data, such as emotional audio and video recordings. Although effective, these approaches require additional resources, which can be time-consuming and costly, especially for low-resource languages such as Serbian. On the other hand, a significant g ap exists in understanding cognitive processes involved in human emotional speech production. To address this, emotional speech from an information-theoretic perspective was explored. Specifically, surprisal values, estimated using five s tate-of-the-art language models were analyzed for their correlation with spoken word duration. The results indicated variations in Pearson's coefficient between these parameters in different emotional states, with general multilingual models outperforming Serbian-specific models in surprising estimation. These results can offer valuable insights into emotional speech production in other South Slavic languages as well, such as in Croatian, Bosnian, and Montenegrin.