Translation as De/Coding

In 1949 one of the inventors of the mathematical theory of information and communication, Warren Weaver, wrote and distributed a report to two hundred of his colleagues. The title of Weaver's report was "Translation." Its purpose was to explore the idea that one might design a computer program to translate texts from one language to another. Those familiar with Claude Shannon's and Warren Weaver's mathematical theory of information and communication will not find the following too surprising. But, anyone who has done the work of a translator is likely to find Weaver's understanding of translation fantastical:

When I look at an article in Russian, I say, 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode'.

Weaver wrote this shortly after the World War II when the computer was first applied -- with great success -- to the problem of breaking Germany's military communication codes. In short, for Weaver, it was clear that computers were good for the tasks of decryption and so if a problem could be reconceptualized to look like a decryption problem, then it was probably something a computer could do. Despite skepticism voiced by scientific luminaries of the day -- notably Jerome Wiesner, later president of MIT, John F. Kennedy's science advisor, and the co-founder of the MIT Media Laboratory -- Weaver's "Translation" essay was enormously influential and, arguable, still informs computer scientists' approaches to translation. For example, the statistical approach to decoding Weaver outlined in his essay constitutes the core of the most successful work in contemporary machine translation.

After half a century of sustained work on Weaver's translation-as-decoding problem, how much progress has been made? Anyone with a web browser has the means to check. At this URL (http://www.translate.ru/) one can input a text in English and receive a translation (and vice versa). I typed in a sentence from a technical text that was used as a cannonical example (e.g., Oettinger, 1959) in the early days of machine translation work:

"In recent times Boolean algebra has been successfully employed in the analysis of relay networks of the series-parallel type."

Using the Russian translation that I received from the website, I translated the Russian back into English and received the following:

"In recent time when Boolean algebra it was successfully used in the analysis of networks of the relay of type parallel by a number(line)."

This result can be compared to the output of computer translation programs of forty years ago. Clearly today's machines are much better even if they are not very good (cf., Oettinger, 1959). Even so, one might object that the web-based system demonstrated above is not the best automatic translation system with which to demonstrate today's capabilities. But, given any contemporary system it is always easy to find a text on which the system performs poorly.

Over the years many fixes have been proposed for natural language processing systems and each "fix" has engendered an entire computer science research direction unto itself. For example, much success has been had simply by sticking with slightly more sophisticated statistical models of language and then relying on ever-increasing computer speeds and memory to increase the number of computations performed for every word processed (e.g., Manning and Schutze, 1999; Jurafsky and Martin, 2000). Others have noted that the systems often fail to take into account enough of the pragmatic context of a text or utterance. This has created several large artificial intelligence projects to attempt to encode all of pragmatic knowledge (e.g., "what goes up, must come down," "anything that falls in the water will get wet," "humans need to breath in order to live," etc., etc., etc.; e.g., Lenat and Gupta, 1990). Other work has explicitly limited the pragmatic context so that the machine translator performs well for a given set of domains (e.g., airline reservations), but can do nothing with a text or speech outside those domains (e.g., Seneff, Lau and Polifroni, 1999). Finally, many projects have been launched in order to design and implement a universal "interlingua" into which all other languages can be mapped (e.g., Gruber, 1993).

After half a century of sustained work on Weaver's translation-as-decoding problem, how much progress has been made? When measured against the enormous amount of money that has been spent on computer programs written to "decrypt" novels, newspapers, technical reports and other sorts of texts, has the small amount of progress acheived been worth the budgets -- indeed careers -- expended? Perhaps, fifty years later, it's finally time to admit Weaver's folly: translation is not a task of decryption. In fact, it may be time to critically examine many of the so-called "fixes" of computer science with the same sort of skepticism Ludwig Wittgenstein applied in his examination of the "problems" of philosophy (Wittgenstein, 1951). Many of the "problems" of natural language processing may stem from a badly chosen set of foundational propositions (e.g., translation-as-decryption) and might, therefore, be more properly understood as pseudo problems or just silly games.

<< prev | next >>