Resources

Tools

Terminology extraction suite

A set of programs for automatic terminology extraction using a statistica approach. The programs are released with a GPL license and can be freely downloaded and used.

You can download the Windows distribution from: http://lpg.uoc.edu/TES/TES-09.03-win.zip

The source distribution suitable for Linux, Mac and Windows (some additional software and packages are required) can be downloaded from: http://lpg.uoc.edu/TES/TES-09.03.zip

Wikipedia2TBX

A program to create terminological glossaries from Wikipedia.

The source distribution suitable for Linux, Mac and Windows (some additional software and packages are required) can be downloaded from: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-0.2.tar.gz

You can download the Windows distribution from: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-0.2-win.zip

The user manual is available in the following languages:

TO2TBX

A program to convert TermCat’s Terminologia Oberta terminology files to TBX and tab delimited text file. Terminology files can be downloaded from: http://www.termcat.cat/productes/toberta.htm

The source distribution suitable for Linux, Mac and Windows (some additional software and packages are required) can be downloaded from: http://lpg.uoc.edu/TO2TBX/TO2TBX-0.2.tar.gz

You can download the Windows distribution from:

http://lpg.uoc.edu/TO2TBX/TO2TBX-0.2-win.zip

The user manual is available in the following languages:

Parallel corpora and translation memories

The DGT Multilingual Translation Memory of the Acquis Communautaire: DGT-TM

The Acquis Communautaire is the entire body of European legislation, comprising all the treaties, regulations and directives adopted by the European Union (EU). Since each new country joining the EU is required to accept the whole Acquis Communautaire, this body of legislation has been translated into 22 official languages. As a result, the Acquis now exists as parallel texts in the following 22 languages: Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. For the 23rd official EU language, Irish, the Acquis is not translated on a regular basis; which is why DGT-TM does not include data in Irish.  Read more at: http://langtech.jrc.ec.europa.eu/DGT-TM.html

We have extracted all language pairs involving English and Spanish and published as TMX and tab separated text at: http://lpg.uoc.edu/corpus/DGT