At the iFLYTEK Shanghai Institute of Artificial Intelligence there is a team of young researchers exploring multilingual intelligent voice technologies. The aim of the project, which iFLYTEK launched in 2019, is to develop machines that can speak multiple foreign languages and dialects.
The team’s goal is to incorporate languages from around the world into their technology to make voice interaction and multilingual communication technologies fit for commercial use. Speech synthesis can vary in complexity, and each one needs individual review and attention from developers to ensure accuracy. Chinese, for example, has the pinyin pronunciation system ready for use but many languages do not.
“To be honest, I felt immense pressure when I received this task, but I was still confident that I could do the job well,” said the team leader, Gao Li. “In developing language technology, we are trying to create an efficient integration of linguistics and engineering to build a concise set of linguistic symbols to represent as many implicit concepts and grammar systems as possible and to enable the AI models to learn efficiently,” she added.
Gao and her team have decades worth of experience in speech synthesis. “After 10 years of accumulation of experience, we have already developed a set of general methodologies in this regard. In the era of end-to-end services, we can quickly build a universal global phone-based text-to-speech conversion system and a multimodal text analysis system. This enables end-to-end models to be quickly applied to various industries,” Gao said.
Since the program began, the team’s speech recognition and synthesis technologies have already reached commercial viability. The speech recognition technology can recognize more than 60 languages and the speech synthesis technology recognizes more than 30. The speech recognition tool has a Mean Opinion Score (MOS) exceeding the commercial use level of 4.0. However, the team is still working to achieve further technological progress while integrating the technologies with products that facilitate human-computer interaction and multilingual communication.
The research team has successfully supported iFLYTEK’s technological breakthroughs and commercialization for international markets. For example, the team has worked with Haier to develop a multilingual speech recognition system for the company’s expansion in the Southeast Asian market. It has also conducted multilingual cooperation projects with Chinese carmakers such as SAIC, Changan, and Chery, covering dozens of languages such as English, Japanese, Thai, Spanish, and Italian.
“We must always remain open and introspective and always recruit the best talent to bring more possibilities to the team. Meanwhile, we must always conduct useful research with strong methodology, so that we can meet consumer needs in increasingly innovative ways,” Gao said. She hopes that in the future, her team will build an even better multimodal text analysis system and apply it to more languages, and that the system can serve machine translation, semantic understanding, and other purposes to better meet people’s needs through AI-empowered solutions.