Neural machine translation

Neural machine translation (NMT) refers to the translation of a text or document by a computer. The required artificial neural networks are based on vast amounts of data and deep learning. NMT systems run on powerful graphics processors and autonomously recognize patterns in the network to generate the best translations. The exact process that takes place in the background is often obscure (blackbox), even for the developers of the system.

Before we zoom in on neural machine translation, we will present a brief summary of the evolution of MT. Prior to NMT, rule-based machine translation and statistical machine translation were very popular.

Rule-based machine translation (RBMT)

The rule-based approach was the conventional and, until the advent of the statistical approach in 1988, the only method available for MT. First, the languages were linguistically analyzed at the morphological, syntactic, and lexical levels. Next, an abstract representation was created from the source text and subsequently translated into the target language.

This method was not only imprecise, since language is marked by exceptions, regional differences, and mistakes, but also expensive, as substantial manpower was needed to train systems.

Statistical machine translation (SMT)

The idea of statistical machine translation was brought up by Warren Weaver as early as 1949. However, the computers available at the time were not good enough, and the amount of machine-readable texts, which are essential for statistical analyses, was very limited.

Almost 40 years passed until IBM scientist Peter Brown introduced the full-fledged statistical approach to machine translation at the Second TMI Conference of the Carnegie Mellon University. Statistical machine translation was based on the approach of creating translations on the basis of probability calculations. The needed information was extracted from aligned, parallel corpora, i.e. from texts available in two or more languages.

Statistical machine translation was less expensive than the rule-based approach and the development of new systems was faster, since grammar rules did not need to be created and programmed manually. Most of the algorithms on which SMT were based were language-independent. Therefore, a system could quickly be trained and expanded with data of new languages.

If you remember the results early translation systems like Google Translate used to deliver, you know that statistical machine translations are not so great. The translations were often faulty in terms of the grammar and content and not good enough for regular professional use.

Neural machine translation

In 2014, the first scientific paper on neural machine translation was published, and one year later, OpenNMT, the first working NMT engine developed by Harvard in collaboration with SYSTRAN, was presented. From then on, MT started to get noticed. Suddenly, it was no longer treated as a poor relation, but as a serious alternative to pure human translation.

Artificial neural networks

The basis for neural machine translation is an artificial neural network (ANN). This network consists of many neurons, each of which is merely capable of simple mathematical calculations. What makes neural networks so powerful and flexible is the establishment of dependencies and linking of many of these individual neurons. In an ANN, a neuron is described by a function that mathematically establishes a relationship between an input set and an output set, where each element of the input is associated with exactly one element of the output.

A neural network consists of multiple layered neurons. Two additional layers, the so-called input layer and output layer, represent the communication interface between the user and the network. Between these two layers, there can be any number of further layers, the number and complexity of which can be freely determined depending on the task. The interconnection of all neurons of neighboring layers results in an operable artificial neural network.

Learning phase of the artificial neural network

At the beginning of the learning phase of the ANN, the weightings are first selected randomly. For every input, the network also generates an output. For example, the input may be a word or sentence to be translated. In the ideal case, the output should be a correct translation. The output of the network is compared with a correct human translation. If the result is correct, the weightings do not need to be changed. If the translation is wrong, the network will autonomously change the weightings between the neurons, so that the result is closer to what is desired. In other words, the weightings are optimized. Since the artificial neural network can only learn by way of such comparisons, many examples are needed. Furthermore, this is essential in order for the network to learn to translate in accordance with the input and output. This means that mistranslations are something that the artificial neural network has learned.

This figure shows a neural network consisting of an input layer, and output layer, and two additional intermediate layers. Each neuron of a layer is linked to a neuron of the neighboring layers. The links have values that describe the weighting of the neurons, e.g. w1, w2,..., w21.

Word embedding

When using an artificial neural network for language, the first obstacle is to figure out how to convert words into numbers, since neural networks can only compute numbers. One solution often used to address this issue is the so-called "word embedding", in which every word is mapped to a unique vector in the multidimensional vector space. The network optimizes the positions of the words, so that semantically similar words also lie close to each other in the vector space. For example, words like tree and leaf should lie closer to each other than tree and bicycle.

Encoder-decoder architecture

If this principle is extended to entire sentences, the input can be stored as a vector sequence. This process is also referred to as "encoding". By means of a decoder, this sequence can be transformed back into language. A separate encoder network is required for each input language and for each language to be translated into.

A major advantage of the encoder-decoder architecture is that it is not necessary to train a new network for each language combination. The encoder translates the input into a vector array only the network understands. These vectors can be used by all trained decoders, regardless of the source language. An advantage of artificial networks is that the network is trained to find the most probable translation on the basis of comparison data. Major translation errors can occur when the network is confronted with input it has not been trained for, e.g. proverbs, metaphors, composites, and neologisms.

Generic vs. customizable machine translation systems

Now that we know how neural networks function, let us return to the practical use of neural machine translation.

Prior to the deployment of neural machine translation, the organization needs to decide whether to use a generic system or a customizable system. Generic translation systems are those that usually come to our mind first: Google Translate, DeepL, Microsoft Translator, Amazon Translate, etc. These systems are trained with huge amounts of data (aligned, parallel corpora) from various subject areas (domains). The resulting translations are fluent and usually good, but they are of limited use for translating highly specialized (technical) texts.

In contrast, customizable translation systems are trained with customer-specific data. Therefore, they can consider the individual corporate language and terminology and thus deliver more accurate translations. The resulting raw translations are better and require less post-editing. The amount of training data required for a customized engine depends on the provider. Usually, the customized engine is first trained with generic and domain-specific texts and then "enriched" with internal texts. In any case, due maintenance of the translation memories and terminology databases is a key precondition for the customization.

Leading providers of customizable systems include SYSTRAN, Textshuttle, SmartMATE, KantanMT, and Omniscien. DeepL also belongs to this category to a certain extent, as the online interface of the premium version allows the creation of custom glossaries. If this function is activated, the engine is forced to use the stored terminology. A correctly created glossary definitely contributes to better results.

Unfortunately, the Starter package only allows the creation of one glossary. Depending on the text type, the stored terminology may not necessarily be suitable. To create several glossaries, you need to order the Advanced package. What is more, the Advanced package enables the connection of DeepL to your translation management system via API. However, the glossary function is only available in the browser version.

Further information on generic and customizable translation systems, the selection of a suitable machine translation provider, MT quality, training data for MT systems, costs of MT, and the collaboration with translators is available in our detailed article "Machine translation for companies".

Advantages of generic translation systems

  • Relatively inexpensive
  • Quick implementation
  • Good translations of "normal" texts

Disadvantages of generic translation systems

  • Poor translation quality of specialized texts
  • More post-editing required

Advantages of customized translation systems

  • Good translation of specialized texts
  • Less post-editing required

Disadvantages of customized translation systems

  • More expensive implementation
  • Longer lead time
  • Not enough data available for the customization

Neural machine translation with connection to the translation management system

If you quickly need to machine-translate an e-mail from English to German, you will most likely go to your favorite provider's website, insert the text, and subsequently copy and paste the translation to your e-mail program.

The procedure in this case is perfectly acceptable (always keeping in mind the provider's privacy policy and your company's privacy policy). However, for time, security, and quality reasons, this approach is not recommended for professional mass use of MT.

To make full use of the potential of neural machine translation, you should connect the NMT system of your choice to your translation management system via API. In this way, you can benefit from the additional functions of the TMS: the translation memory, the terminology database, and the quality management. These components ensure that your translations are consistent and your corporate terminology is duly used.

Provided that your translation memory is large enough, it would not make sense to machine-translate the entire text, as you already have human translations. No matter how good your translation system is, these human translations are usually better than machine translations. In the TMS, you can therefore determine the threshold for the use of the NMT system. For example, you can determine that MT be used only for segments without fuzzy matches with a match rate of more than 70 percent. You can determine this value individually.

Moreover, all new segments are stored in the translation memory. The next time the same sentences appear, machine translation will not be used. Instead, the system will automatically use the previously post-edited segment. This saves time and money.

Moreover, it is also beneficial to do the post-editing in a TMS environment. The post-editors need the terminology database and the quality management module in order to work in a precise and efficient manner.

In our article "Post-editing—better quality for machine translation", you can find more information on the subject of post-editing. Additionally, you will learn about the typical errors of machine translation systems and the differences between light and full post-editing.