Communicating across borders
All you wanted to know but were afraid to ask about machine translation
OK, bit of a naff heading, but with the latest claims from Google of a breakthrough in neural machine translation, perhaps it’s time to write another GlobalDenmark blog with a little information about the current state of play.
What’s the difference between machine translation and computer-assisted translation?
First it’s important not to confuse machine translation (MT) with computer-assisted translation (CAT).
With MT, the computer does all the work: input a French text, push a button, and out comes an Italian text. The text then usually needs post-editing.
With CAT, the computer will search through terminology databases and databases of previously translated sentences and match the input (source) text with terminology and sentences (or part sentences) in the output (target) language. Then the computer suggests a best match the translator can use or discard.
All professional translators have been using some form of CAT for many years. As I recall, at GlobalDenmark we purchased our first CAT tool in around 2000. This post is about MT, not CAT.
Are there different types of machine translation?
Now, traditionally there have basically been three types of MT: Rule-based MT (RbMT), Statistical MT (SMT) and hybrid systems combining the two. You can almost guess the difference.
With RbMT, masses of linguistic rules and dictionaries about the source and target languages are entered into the computer. The computer then applies these to do the translation. As you can imagine, putting all these rules and other input together is extremely demanding, and as languages develop the rules have to be changed too, so there’s a good deal of maintenance.
With SMT, the computer itself analyses monolingual and bilingual texts to build its own models. The computer learns from the input texts, so the larger the input, the better the model and the more superior the translation. The computer will also ‘learn’ and develop with the language.
Of course, the input must be appropriate and of high quality for the computer to learn, and this can be a problem for SMT. However, if the input texts are all very similar, for example instructions and manuals, terms and conditions, etc. the computer can put together a very good model.
Finally, as I said, there are systems that combine the two approaches.
What’s Google doing?
There’s a newer, fourth type of MT, and this is what Google is getting excited about, as I mentioned in the introduction. It’s called neural machine translation (you guessed it – NMT).
To be honest it’s pretty heavy stuff and having read a lot of articles and Wikipedia I still don’t feel much the wiser. However, it seems we’re in the artificial intelligence world, with an artificial neural network in the computer based on the biological neural networks in the human brain. One advantage with NMT is that it circumvents the need for the vast memory capacity required by SMT.
…..and what’s Google’s breakthrough?
In a paper published in September 2016, Google claims that “in some cases human and Google NMT translations are nearly indistinguishable” and that the “quality of the resulting (neural) translation system gets closer to that of average human translators”.
Of course, others have been quick to question these claims, and have been especially critical of the fact that the translations in the study were based on a sample of “well-crafted simple sentences”.
What do you think?
I think it’s interesting that Google mentions average human translators in the quote above. We should be careful of comparing all ‘human’ translators with all MT. Similarly, you can’t say all translation jobs are the same. Of course, MT without the right rules/statistical data/’neurons’ will never be good, and humans need the right linguistic skills, insights into specialist areas, etc. too. There’s a need for MT and there’s a need for human translation. I believe they’re complementary rather than competitive.
At GlobalDenmark we’re looking at MT in all its forms, and I’m looking forward to getting involved: to benefit ourselves and our customers.