The VoiceMOS Challenge 2022

Announcement

The VoiceMOS Challenge 2023 will be announced soon! The 2023 challenge will focus on real-world out-of-domain MOS prediction tasks. Watch this space!

The VoiceMOS Challenge 2022

The VoiceMOS Challenge 2022 has ended! Material from the challenge will remain available online. Read our summary paper of the challenge here.

Human listening tests are the gold standard for evaluating synthesized speech. Objective measures of speech quality have low correlation with human ratings, and the generalization abilities of current data-driven quality prediction systems suffer significantly from domain mismatch. The VoiceMOS Challenge aims to encourage research in the area of automatic prediction of Mean Opinion Scores (MOS) for synthesized speech. This challenge has two tracks:

Main track: We recently collected a large-scale dataset of MOS ratings for a large variety of text-to-speech and voice conversion systems spanning many years, and this challenge releases this data to the public for the first time as the main track dataset.
Out-of-domain track: The data for this track comes from a different listening test from the main track. The purpose of this track is to study the generalization ability of proposed MOS prediction models to a different listening test context. A smaller amount of labeled data is made available to participants, and unlabeled audio samples from the same listening test are made available as well, to encourage exploration of unsupervised and semi-supervised approaches.

Participation is open to all. The main track is required for all participants, and the out-of-domain track is optional. This challenge has preliminarily been accepted as a special session for Interspeech 2022, and participants are strongly encouraged to submit papers to the special session. The focus of the special session is on understanding and comparing MOS prediction techniques using a standardized dataset.

Papers

The following papers were presented at the VoiceMOS Challenge special session at Interspeech 2022:

“The VoiceMOS Challenge 2022.” Wen Chin Huang (Nagoya University), Erica Cooper (National Institute of Informatics), Yu Tsao (Academia Sinica), Hsin-Min Wang (Academia Sinica), Tomoki Toda (Nagoya University) and Junichi Yamagishi (National Institute of Informatics)
“The ZevoMOS entry to VoiceMOS Challenge 2022.” Adriana Stan (Communications Department, Technical University of Cluj-Napoca)
“UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.” Takaaki Saeki (The University of Tokyo), Detai Xin (The University of Tokyo), Wataru Nakata (The University of Tokyo), Tomoki Koriyama (The University of Tokyo), Shinnosuke Takamichi (University of Tokyo) and Hiroshi Saruwatari (The University of Tokyo)
“Automatic Mean Opinion Score Estimation with Temporal Modulation Features on Gammatone Filterbank for Speech Assessment.” Huy Nguyen (Japan Advanced Institute of Science and Technology), Kai Li (Japan advanced institute of science and technology) and Masashi Unoki (JAIST)
“Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset.” Michael Chinen (Google), Jan Skoglund (Google), Chandan K. A. Reddy (Google), Alessandro Ragano (University College Dublin) and Andrew Hines (University College Dublin)
“DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores.” Wei-Cheng Tseng (National Taiwan University), Wei-Tsung Kao (National Taiwan University) and Hung-yi Lee (National Taiwan University (NTU)

The following team papers were also presented at Interspeech 2022:

“A Transfer and Multi-Task Learning based Approach for MOS Prediction.” Xiaohai Tian, Kaiqi Fu, Shaojun Gao, Yiwei Gu, Kai Wang, Wei Li and Zejun Ma
“Fusion of Self-supervised Learned Models for MOS Prediction.” Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino and Yi Zhao

Participate

The challenge has ended, but you can still access the CodaLab challenge page here.

You can also find the BVCC dataset that was used in the challenge here.

Schedule

The schedule for the challenge is as follows:

Release of main track and out-of-domain training data: current
Release of evaluation data / start of test phase: February 21, 2022
Test phase results submission deadline: February 28, 2022
Results sent to participant: March 7, 2022
Interspeech Paper submission deadline: March 21, 2022

Organizers

Wen-Chin Huang (Nagoya University, Japan)
Erica Cooper (National Institute of Informatics, Japan)
Yu Tsao (Academia Sinica, Taiwan)
Hsin-Min Wang (Academia Sinica, Taiwan)
Tomoki Toda (Nagoya University, Japan)
Junichi Yamagishi (National Institute of Informatics, Japan)