
A new index for objectively evaluating vocal roughness. The ARI was developed to automatically quantify the degree of hoarseness!
Diagnostic accuracy is improved to be utilized for remote treatment
- Developed an acoustic model ARI to automatically detect the subharmonics, characteristics of hoarseness, and evaluate voice roughness in a score between 0 and 10.
- The degree of hoarseness relied mainly on subjective assessment by experts and differed depending on the evaluator. Thanks to the program automatically determines the type and quantity of created subharmonics, it becomes possible to conduct quantitative evaluation at the same accuracy as it by experts.
- By using ARI, voice disorders can be objectively evaluated, and the effectiveness of treatments can be clarified. Future applications are expected to AI-based voice diagnosis and telemedicine.
Outlines
A research group led by Itsuki Kitayama (doctor course), Associate Professor (Lecturer) Kiyohito Hosokawa, and Professor Hidenori Inohara at the Department of Otorhinolaryngology-Head and Neck Surgery, the University of Osaka Graduate School of Medicine has developed a new index called the Acoustic Roughness Index (ARI), which can be used to evaluate hoarse voices.
ARI calculates the type and strength of subharmonics in the voice and combines it with conventional acoustic data to express the roughness of the voice in a score between 0 and 10. Although evaluating rough voices was subjective and varied when judged by human ears, after repeated testing using voice data of over 450 people, the research group successfully developed an acoustic model that judges the hoarseness at the same accuracy as it by experts. ARI can be used to diagnose voice disorders, compare before and after treatment, and for research. As the program has been disclosed online so that anyone can use it, it is expected to be used in medical frontlines, research institutions, and other settings.
Fig. 1 Image of the ARI acoustic model
Credit: Hidenori Inohara
Research Background
Hoarseness is caused by irregular vibrations of the vocal cords. So far, humans have evaluated hoarseness using methods such as the Grade, Rough, Breathy, Asthenic, and Strained (GRBAS) scale, but these have been subjective and varied. There was no way to accurately detect and evaluate the acoustic components called subharmonics that cause hoarseness.
Research Contents
In this study, the research group developed a program that automatically determines the type and quantity of subharmonics using Spectral-Based Fundamental frequency Estimator Emphasized by Domination and Sequence (SFEEDS), a mechanism for accurately detecting the fundamental frequency of voice waveforms. By combining this information with numerical values representing traditional voice characteristics, the researchers have created the ARI which numerically expresses the degree of hoarseness.
ARI evaluates voice quality by combining data on reading sentences and saying the vowel /a:/. The ARI scores closely matched the judgments of expert with scores of 2.09 or higher indicating rough voices and scores below 2.09 indicating smooth voices with a high degree of accuracy.
Social Impact of the Research
ARI is a tool that provides objective evaluations which previously relied on human ears. It can be used to diagnose voice disorders, compare before and after treatment, and for research. The program has also been disclosed online so that anyone can use it, and it is expected to be utilized in medical frontlines, research institutions, and other settings.
In the future, this technology will be applied to languages other than Japanese, emotional voices, and singing, so that it can be used for medical treatment and remote voice checks.
Notes
The article, “A Multivariate Model Incorporating Subharmonic Measurements for Evaluating Vocal Roughness,” was published in American scientific journal of npj Digital Medicine (online) at DOI: https://doi.org/10.1038/s41746-025-01702-2.
