How to align files with LF_Aligner

In this post I will explain how to align two files through the open source software LF_Aligner. At the end of the alignment process, this software produces a tmx file, that can be used in any Computer Assisted Translation (CAT) tool.

 (Print version)

[Leggi questo articolo in italiano]

Downloading and installing the software

First of all we have to download the most recent version of LF_Aligner. To date, the most recent version is 4.05. As for all open source software, it is always advisable to check regularly the software home page, because the updates contain often notable improvements. You can download the software from: LF_Aligner is designed for Windows operating systems, but some older version (3.11 and 3.12) are available for Mac and Linux as well (you can download them from: As these version are older, the software can be less precise.

The file to download is a zipped file containing the whole program. There isn't any installer. This means that you can use the software immediately (just unzip the file in a folder and click on the LF_aligner_4.05.exe file), but you won't have any automatic link on the Start menu and on your desktop. If you want the software on the Start menu, unzip the program folder—you can use an open source zip software like 7zip—, copy it in your default Program directory (in Windows is usually C:\Program Files) and then create a link to copy in the Start menu directory (usually something like C:\Users\\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup or C:\ProgramData\Microsoft\Windows\Start Menu\Programs\StartUp).

Preparing the files to align

To proceed with the alignment we need a source language file and a target language file. The texts to align can be, for instance, old translations (made without any CAT tool) or translated texts sent by our customer. The layout of the source and the target files must be as similar as possible. If the segments we are aligning must be used as reference, a good idea would be that of copying the texts on simple text files (.txt), so to purge any formatting that can have a negative impact on the alignment process. An important thing to consider is that LF_Aligner can align only one pair of files per process. If we need to align several files, we can chose one of two possible strategies:

  1. Create an alignment process for each pair of files
  2. Merge all the source language texts in one source file and all the target language texts in one target file (without altering the file order) and create one alignment process.

The best approach depends on the overall complexity of the texts. You can try one strategy and if it doesn't work, try another one.

File Types that can be aligned

LF_Aligner can align the following file types: .txt (with UTF-8 encoding), .rtf, .doc, .docx, PDF (be always careful with PDFs) or .html. The software can also align two on-line web pages by inserting their web address. This option can be useful if you want to create a translation memory using on-line materials.

In this post, we'll align two simple text files (.txt) with UTF-8 encoding.

Alignment procedure

  1. Launch LF_Aligner
  2. The following window opens allowing to select the type of files you want to align.

    LF_Aligner: chose the type of file you want to align

  3. Select the first option and click Next.
  4. The Language selection window opens.

    LF_Aligner: Language selection window

    It's important to select the correct language direction, the one you will use in the translation project where you want to use the translation memory created with LF_Aligner. Select the languages using the Down arrows, then click Next.
  5. The File selection window opens.

    LF_Aligner: File selection window

    Select one file for each language. Select the files, then click Next.
  6. The software segments the two files. At the end, the following window opens:

    LF_Aligner: Segmentation results window

    This window informs on the results of the segmentation process and provides important information. LF_Aligner tries to segment the text in sentences. This means that, depending on the language, the software tries to segment the text whenever it encounters a full stop (.). If this type of segmentation produces inconsistent results in the two files, you can decide to segment the text in paragraphs instead of sentences. This kind of segmentation is not very useful when translating very repetitive texts, but it can be used for creating reference TM in a very fast way (as the alignment based on paragraphs produces usually less alignment errors); this TM can be used for concordance search.
    For each language, this window shows the number of paragraphs followed by the number of sentences detected by the software. E.g. English: 33 > 67. These values are very useful. If the difference between the number of paragraphs and the number of segments of the two files is very high, maybe the two text we are aligning are not one the translation of the other or maybe they contain important differences. One of the text could contain some more sentence (that have not been translated in the other text) or they could be two completed unrelated texts. In this case it is advisable to open the files and see what the problem is. If one of the files contains some paragraph or some sentence absent in the other file, you can just delete the untranslated text and restart the aligning process.
  7. Now you can select the segmentation option you prefer (the first to segment in sentences or the second to segment in paragraphs) and click Next.
  8. The software aligns the segments produced. At the end of this automatic process the following window appears.

    LF_Aligner: End of automatic alignment

    After this procedure, in a temporary folder the software creates a tab separated .txt file containing the aligned text At this point LF_Aligner asks if you want to analyse the resulting alignment using its graphical editor, if you want to create an Excel file and correct the alignment manually on that file (this operation is not quite straightforward) or if you don't want to do any review at all.
    I suggest to use the graphical editor. Select the first option and click Next.
  9. The editor window opens allowing to correct any alignment error. The following window shows a segmentation error.

    LF_Aligner: alignment editor

    In this case in the English text (target language), one Italian sentence (source language) has been translated as two different sentences. To correct this kind of error, select the wrong segment (in this case the English segment on line 2) and click Merge (F1). The selected segment and the one following it will be merged.
  10. Another kind of error can be related to misinterpreted abbreviations containing a full stop, like e.g., Mr., etc.). LF_Aligner interprets correctly most of these abbreviations in the most used languages, but sometimes sentences are segmented in the wrong way, as in the following figure:

    LF_Aligner: alignment error

    In this case, the English segment on line 12 translates the Italian segment on line 13. To move it on line 13, select the line 12 English segment and click on Shift down (F4). Now you must join the Italian segments 11 and 12. Select the Italian segment on line 11, then click on Merge (F1)
    At the end of this manual process, all the lines must be correct: For each segment in source language, there must be a target segment in the target language. If there are empty lines or incomplete lines (as for a text containing one more sentence in the target or source language), these lines will be skipped and won't be imported in the final translation memory.
  11. If the alignment is click File > Save & Exit. The following window opens:

    LF_Aligner: saving file as tmx

    Click on Next.
  12. The TM definition window opens, allowing to define the translation memory properties:

    LF_Aligner: TM properties window

    The first two fields in this window specify source and target language. LF_Aligner don't use language variants. Most CAT tools need the language variants to work effectively. So it's advisable to modify the first two fields and include language variants (e.g. EN-US for American English or EN-GB for British English, IT-IT for Italian, etc.). Usually the language indication includes two characters for the language, an hyphen and two characters for the variant. To know what language+variant code you have to insert in these fields, see the CAT tool you use.
    The third field allows to insert a note. If you use the default (Third column), the software insert the names of the two aligned files. If you select Custom, you can insert a custom note.
    In the fourth field you can insert an ID for the creator of the aligned segments. Default value is LF_Aligner. As the creator field is usually displayed in CAT tools, I suggest to use this field to state that the segment is the result of a semi-automatic alignment process.
    The last field contains a date.
    When you have finished filling these fields, click on Next.
  13. A window opens informing on the result of the TMX creation process. Click OK.
  14. The software closes. All files produced by LF_Aligner are stored in a subfolder you can find inside the directory of the source language file. This subfolder has a name similar to the following:
    In this folder you will find the tmx file resulting from the alignment and other files produced by LF_Aligner; one of this is a tab separated value .txt file, that can be used by some translator tools. Now you can import the tmx file in your preferred CAT tool.

 (Print version)

dasmi Thursday 12 February 2015 - 06:28 am | | Translator tools

No comments

(optional field)
(optional field)

Comment moderation is enabled on this site. This means that your comment will not be visible until it has been approved by an editor.

Remember personal info?
Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.