Google Translate now speaks Sepedi and Tsonga – but new tech means ‘it’s not perfect’

Translation results may not be as good as for European languages, warns Google.

  • Google Translate has added Sepedi and Tsonga to the languages ​​it can handle.
  • The service already includes Afrikaans, Sesotho, isiXhosa and isiZulu.
  • The new set of languages, 24 in all, was added using a new machine learning technique, Google says.
  • This means that the results are not always as good as for the major European languages.
  • For more stories, go to

Google Translate can now translate to and from Sepedi and Tsonga, it announced this week, but not yet as well as it should.

Sepedi and Tsonga are among 24 new languages ​​added in “a technical step”, using zero-shot machine translation. This technique makes it possible to create a translation engine for languages ​​that do not have translation dictionaries or large amounts of translated text available. Instead, he can learn a language using relatively small amounts of monolingual text from what are euphemistically called “underfunded languages”.

This has a price.

“While this technology is impressive, it’s not perfect,” warned Isaac Caswell, senior software engineer at Google Translate. “And we’ll continue to improve these templates to deliver the same experience you’re used to with a Spanish or German translation, for example.”

In a more detailed article on the AI ​​approach to adding the new languages, Caswell and his colleagues were more direct.

“[W]We would like to point out that the quality of translations produced by these models still lags far behind that of the higher resource languages ​​supported by Google Translate. These models are certainly a useful first tool for understanding content in underfunded languages, but they will make mistakes and show their own biases.”

Google Translate already supports Afrikaans, Sesotho, isiXhosa, and isiZulu. It also includes a number of regional and continental languages, including Shona and Swahili. But many others are missing, and building monolingual models currently seems the best bet for their inclusion.

Although the continent is home to some 2,000 languages, it is barely represented in the field of Natural Language Process (NLP) AI, says the Masakhane Project, which is working to change that.

“The tragic past of colonialism has been devastating for African languages ​​in terms of support, preservation and integration. It has resulted in a technological space that does not understand our names, our cultures, our places, our history.”

Receive the best of our site by e-mail every day of the week.

Go to the Business Insider homepage for more stories.

Comments are closed.