Part of the richness of an increasingly pluralistic society is the growing amount of languages that are spoken in ever-smaller geographical areas. Just in New York City, over 8 languages are spoken by large segments of the population including Yiddish, Italian, and Chinese. And this is just the city of New York. If one considers the entire state of New York, the number goes up to over 30 different languages. And so we can get an idea of just how multilingual our country has become.
It should come as no surprise, therefore, that in an era of smartphones, tablets and apps, several translation applications have come to the fore with a linguistic sophistication that is simply awe-inspiring. One of these pearls of software, certainly the most popular today, is Google Translate. Last May, Google bought Word Lens technology and has now incorporated it into its most recent update. One of the key features of this app is that one can now point the camera at, say, a street sign, and it will automatically translate the text. What could be cooler than this, right?
These apps have been touted as one of the greatest achievements of 21st century software engineering. And frankly, it’s hard to disagree. Newer translation applications are incredibly sophisticated, the algorithms that integrate them are superb, and their capacity for learning truly knows no bounds. Furthermore, owing to their astonishingly numerous network of users, apps like Translate, are brought to anyone with a modern hand-held device absolutely gratis. So, rather than exalting these already lavishly praised marvels of modernity, I thought it would be less banal and more interesting to focus on their limitations. And to simplify this task, we will be looking particularly at Google Translate.
The easiest task for any translating algorithm developer is single-word translation. No matter how arcane or inkhorn the term, by matching billions of websites to their respective translations, we can pair up each word to its most often used translation and come up with a translated term that will be accurate 99% of the time. I was surprised at the result when I entered the following list of neural structures into the source box: Corpus striatum, globus pallidus, and nigrostriatal pathway. The translation into Spanish was absolutely precise. No false cognates, goofy transliterations, or missing words. However, enter grammar and the little gizmo becomes dumbfounded. Consider the following very simply examples with their Spanish translation in Italics.
“Hey Martha. Would you be interested in going to the movies with me? I heard Birdman, Selma, and Sniper are pretty good!”
¿Hey Martha. Estaría usted interesado en ir al cine conmigo. Oí Birdman, Selma, y Sniper, son bastante buenos.
Right off the bat, the translation gets the tone wrong. Using the third person singular is often used in Latin-rooted languages as a sign of politeness towards an older person or someone with authority. If I am trying to ask a girl out, using estaría, instead of the more appropriate estarías, sounds stilted and overly formal.
The second problem is that it is impossible for context-blind algorithms to distinguish gender within independent clauses. And so we get the masculine interesado even though Martha is clearly a female name. This is the same reason we get Buenos at the end of the translation even though the plural noun películas is feminine.
Thirdly, the indicative “that”, as we know, may be elided from casual speech giving us I heard Birdman was good. In Spanish and most other languages this is not possible, and Google, of course, has no way of knowing whether Birdman is the direct object of the sentence or part of a larger continuing predicate. Instead of the correct Oí que Birdman, we get the broken ungrammatical Oí Birdman.
When the teacher comes, tell her what you’ve been feeling.
Cuando llegue el maestro, dile lo que se ha estado sintiendo.
Again, the algorithm equivocates on the gender: the teacher is clearly a woman, but Google can’t seem to connect the dots even after detecting a “her” in the following clause. And then there is a strange insertion of the impersonal pronominal “se” in the translation, likely due to the ambiguity implicit in “been feeling” (The verb to feel could be both transitive or intransitive.)
And lastly, just for good measure, consider the following sentence:
When all these children don’t exercise, the question becomes, how many children do exercise?
Cuando todos estos niños no hacen ejercicio, la pregunta es, ¿Cuántos hijos ejercicio?
“Do” must be one of the most problematic words for any automatized system of translation. It can function as an auxiliary as in Do you know? ; as a verb in You do what I say; as a noun in a list of the dos and don’ts of management; and it can be part of myriad phrasal verbs (do in, do by, do for, do up, do without, make do). In this case, Google doesn’t realize that “do” is being employed solely for emphasis, but, interestingly, instead of coming up with a probable guess like using “do” as a verb since exercise is unambiguously something that people do, it simply omits the verb altogether and spits out an incoherent “Cuantos hijos ejercicio?”.
Needless to say, a human interpreter with a broader awareness of the textual and contextual cues would have no problem providing an accurate translation to each example.
At this point it feels almost arrogant and cheap to sneer at Google for using “hijos” instead of “niños”, the correct translation for “children” in this context. English doesn’t have a gender-neutral word for human progeny over the age of 12. All we have is “children” and “kids”, which always cracks a smirk on anyone listening to someone else say: “I have three 40 year old kids.