Po polsku
Zamiast CV
Licznik znaków

A Perl script to convert a text file to Multiterm XML

Added: 12 June 2008, last updated: July 17, 2008

I've often read in forums and mailing lists about people having problems with importing a glossary into Multiterm. Most of the time you need a bilingual glossary from which you can insert terms into Word, TagEditor or SDL Edit.

A few years ago I had the same problem and it was then that I developed a Perl script to convert a tab-delimited text file into an XML file compatible with Multiterm's bilingual glossary template. Unfortunately this script ran only inside Notetab Pro, which can execute external scripts and grab their output.

Now I learned how to handle encoding conversions in Perl and I made a standalone version of the script, which can be downloaded here. Actually two scripts, one for English – Polish, and the other for Polish – English glossaries.

Before you use the script, you will have to edit it once to adapt it to your language pair. The script may be made smarter in the future, but for now it does its job well, once you edit the language settings.

When you open the script with a plain text editor (one that can edit/save UTF-8 encoded files), disable wrapping of long lines and go to line 31 which looks like this:

<language type="English (United States)" lang="EN-US"/>

Change English (United States) and EN-US to the correct settings for your source language. Go to line 45 of the script and repeat this step for the target language.

You can check the correct language names and codes in step 3 of 5 of Multiterm's Termbase Creation Wizard.

Make sure that you do not delete the quotation marks around the language name and language code, nor the forward slash at the end of the tag.

Save the script. You can use the Save As command and rename the script from MTENUSPL.pl to something that represents your source and target language. Keep the pl extension. It stands for “perl” not “Polish”. ;-)

To run the script, you need:

To run the script, copy it where your tab-delimited file is.

Open the command line (Start key+R, type cmd, press Enter). Change the directory to where the script and the tab-delimited file are.

Type the following command:

perl MTENUSPL.pl sourcefile.txt

and press Enter. If the source file is “well formed”, the script will process it and create a sourcefile.txt.xml file. This file can be imported into a Multiterm termbase based on the bilingual glossary template.

You can download the two scripts in a zipped file here. For feedback about the scripts please use this form, or contact me through the cat_conv yahoogroup.

When you have successfully created your xml glossary file, you can import it into Multiterm. This short tutorial explains how to do it.

Revision history:

December 24, 2008:

July 17, 2008: