Logos Multilingual Portal

25 - Translation memories

«And yet, such a withdrawal of the watchers from the gates of the intellect, as Schiller puts it, such a translation into the condition of uncritical self-observation, is by no means difficult»1.

One of the technological developments that most changed the translators' activity in the last two decades is the use of translation memories. Even if, it must be said from the start, just a portion of translators uses them. I'll start with the definition of "translation memory".

Translation memories are originated where the failure of word-by-word machine translation was recognized. The basic assumption was: if the creation of databases containing perfect matches of words in the protolanguage and words in the metalanguage was possible, it would have been very handy, because the only work remaining would have been to insert the word-translatant pair into the database, and then one could use it indefinitely in future translations.

Since that doesn't work, or at least it doesn't work properly, we could aim at a less astonishing but more precise result: storing not single words, but word combinations, phrases or short sentences. In this case the inputting of words in the database would be a much harder job. Instead of just inserting the word "home", for example, I must insert all the sentences that might contain the word "home". But the result is certainly more interesting.

Since the primary problem of automatic translation concerns the phrase structure (it is hard to give a computer syntagmatic ability, therefore its words are combined in a wrong way) and the choice of the right translatant among n possible translatants (it is also hard to teach a computer paradygmatic ability), such a problem is avoided by giving the computer fragments (chunks) of syntactically coherent discourse that can be (at least sometimes) substituted for chunks of discourse in the prototext.

Building on this theoretical assumption, engineers have created a software program combining two main functions: segmentation and translation memory proper. To define "translation memory" I'll start by defining "segmentation".

Since, as I said, translation memories work with chunks of text, or "segments", it is necessary for someone to decide how these segments are created, i.e. how a text can be segmented. Criteria like number of characters, or lines, or words would make no sense, naturally. The number of characters would result in cutting words, the number of words randomly chopping phrases, the number of lines would be equally arbitrary.

Consequently, to be able to have chunks with a given autonomy and sense, that can reasonably recur in other texts in the same form or in a very similar form, the usual segmentation occurs as a function of punctuation marks. The choice of shorter or longer segments implies advantages and disadvantages. Choosing very short segments increases the chances of the recurrence of the same string in future translations. The number of words composing a string is inversely proportional to its diffusion. Here is an example.

If I search with Google the string:

"Instructions for use" I get 197.000 reults.

If I search with Google the string:

"Instructions for use of" results are 20.500

If I add the article:

"Instructions for use of the" results are 7,800

If the string is still more specific:

"Instructions for use of the software" results are only 55.

This experiment would push in direction of short segments. On the other hand, if this argument is taken to the extreme, segments end up coinciding with single words, and we know that this implies too many problems to consider it useful. Therefore, the choice of long segments has the plus of increasing the odds that the proposed translatant is eventually reusable.

Usually punctuation marks are a good indicator where to break a sentence and make a segment of it: on a text-by-text basis, and according to personal preferences, one can decide whether to include the comma or not.

The software managing translation memories initially segments the prototext. Now the translation memory proper gets into action.

Having segmented the prototext, the user is presented with two windows, one above the other. On the upper one you read the first segment of the prototext, while the lower one is empty. It is empty because we are presuming that this is the first sentence that the translator translates with this translation memory, that is therefore empty.

The translator writes her translation in the lower window, then tells the program to go on to more of the text. Then the translator sees the next sentence of the prototext (segment), with the lower window empty. But the program has, in the meanwhile, automatically created a file in text format (those that end with the .txt suffix)containing some data (that we'll see in the following units) concerning the translator, the linguistic combination, sometimes the client etc., and has stored in such a file the segment of prototext and the segment of metatext.

In this way, on the first occasion in which the translator is presented with the same segment of prototext, the program will automatically insert in the lower frame (in the form of suggestion, that the translator can change) the metatext fragment in the way it had been translated and maybe modified in a preceding session.

Since, as I have said, the odds that recurrence of exactly the same prototext chunk are low, the programs managing memories allow to fix the percentage of matching so that a chunk is considered comparable to another. In the following units we'll see these principles concretely in action using the example of the Wordfast program.

I'd like to add just a note about the advantages and disadvantages of fixing a lower or higher percentage of matching. If you set this threshold at 100% (the program proposes pretranslated chunks only if the segment matches 100% the segment in question), the advantage is that you are certain that a proposed text is "correct", and the disadvantage that odds that the same identical segment will occur are very low.

If you set as a percentage, let's suppose, 50, the memory will give a lot of "suggestions", but in most cases the will be unacceptable.

Please note that translation memories solve the ancient problem afflicting translation studies: how you can decide if a translation is "correct".

"Correct translation" is, according to me, a translation made by me and reviewed by me and accepted by my client. Very pragmatic, really! In this case, if translation studies cannot better define "good" it's because it is a descriptive, not prescriptive, science. The client tells us something of our quality by offering us another job, or by canceling us from his lists.


Bibliographical references

FREUD SIGMUND, L'interpretazione dei sogni, in Opere, vol. 3, Torino, Boringhieri, a cura di C. L. Musatti, 1966.

FREUD SIGMUND, The Interpretation Of Dreams, translated by A. A. Brill, London, G. Allen & company, 1913.

CHAMPOLLION YVES Wordfast, available in the world wide web at the address www.wordfast.org, consulted 23 May 2004.

1 Freud 1900: 101.