×

We've got news for you.

Register on SowetanLIVE at no cost to receive newsletters, read exclusive articles & more.
Register now

Language barrier big stumbling block to Google's dream

IN Google's self-prescribed mission to "organise the world's information and make it universally accessible and useful", it has encountered a stumbling block: language.

It can guide an English-speaker to web pages written in English, but what if the best or most appropriate site is written in Spanish, Arabic or Hindi?

The answer, the company believes, is to build the Internet's Tower of Babel.

This would break down the language barrier, providing a system that can translate between hundred thousands of different dialects. Googlers, the likes of Ben Gomes, a senior search executive, describe their translation work with a missionary zeal, emphasising its possible benefits to humanity: "I grew up in India where Hindi has tonnes of speakers, but almost no content (on the web)," he said.

"If they could get access to the world's content, that would make a huge difference ... in ways that education makes people better."

Nevertheless, sceptics point out that the project also has sound business motivations. In the past decade Google has conquered the English-speaking web, but the rest of the world is more ambivalent.

English is the dominant tongue on the net, but billions of people cannot read it. Indeed, data from Nielsen, the market research company, and the International Telecommunications Union show that only a quarter of the Internet audience speaks English and only about 35percent of Google's revenues come from outside the US and Britain.

Like much else at Google, the solution to the problem lies in number-crunching. Google is trying to turn language into a mathematical problem.

Its "machine translation" system analyses millions of different texts on the web, learning the laws and exceptions of a language and applying these rules to its translations.

While Google's search engine is based on an algorithm - a mathematical equation - that can find a website that a user is looking for, its translation engine is trying to create the perfect algorithm for converting languages.

"The way we are doing it is to learn from vast amounts of data," Franz Och, the research scientist who leads Google's translation team, said.

"Our system is learning to mimic what human translators do ... the quality of our translation is getting better."

Mr Och and his team have already translated 58 of the world's most popular languages, including Spanish, Japanese and Indonesian.

"The system allows people to convert phrases, passages - even whole websites - into their native tongue. The aim is to translate as many of the world's 6000 languages as possible."

No other organisation could attempt the task. It would need, as one Google executive put it, "infinite computing power ... We have, by anyone's standard, the most computing power in the world".

To crunch billions of search queries daily, Google has built huge "data centres" the size of football pitches filled with supercomputers and servers.

These are now also used to analyse millions of passages of text, as well as many other ambitious projects that Google undertakes. Google does not reveal figures but analysts estimate that its data centres cost tens of millions of dollars to maintain.

There remains one key problem for Google's translation technology.

In the opinion of many, it is not great. Erick Schonfeld, from the TechCrunch blog, has said: "Google does a decent job translating web pages from other languages, but machine-based translation is still not good enough when you need a truly accurate translation."

Och accepts that the system remains in its early stages - "Human language is just so complex.

"It has so many rules and those rules have exceptions" - but in time, Google is betting that its translation technology will make it a natural port of call for the 3,2billion people in the world who do not speak English.

As well as perfecting the system, Google is building devices that can make it more useful.

Would you like to comment on this article?
Register (it's quick and free) or sign in now.

Speech Bubbles

Please read our Comment Policy before commenting.