Tools
Terminology extraction suite
A set of programs for automatic terminology extraction using a statistica approach. The programs are released with a GPL license and can be freely downloaded and used.
You can download the Windows distribution from: http://lpg.uoc.edu/TES/TES-09.03-win.zip
The source distribution suitable for Linux, Mac and Windows (some additional software and packages are required) can be downloaded from: http://lpg.uoc.edu/TES/TES-09.03.zip
Wikipedia2TBX
A program to create terminological glossaries from Wikipedia.
The source distribution suitable for Linux, Mac and Windows (some additional software and packages are required) can be downloaded from: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-0.2.tar.gz
You can download the Windows distribution from: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-0.2-win.zip
The user manual is available in the following languages:
- English: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-user_manual-eng.pdf
- Catalan: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-Manual_d_usuari-cat.pdf
- Spanish: http://lpg.uoc.edu/Wikipedia2TBX/Wikipedia2TBX-manual_de_usuario-spa.pdf
TO2TBX
A program to convert TermCat’s Terminologia Oberta terminology files to TBX and tab delimited text file. Terminology files can be downloaded from: http://www.termcat.cat/productes/toberta.htm
The source distribution suitable for Linux, Mac and Windows (some additional software and packages are required) can be downloaded from: http://lpg.uoc.edu/TO2TBX/TO2TBX-0.2.tar.gz
You can download the Windows distribution from:
http://lpg.uoc.edu/TO2TBX/TO2TBX-0.2-win.zip
The user manual is available in the following languages:
- Catalan: http://lpg.uoc.edu/TO2TBX/TO2TBX-Manual_d_usuari-cat.pdf
- Spanish: http://lpg.uoc.edu/TO2TBX/TO2TBX-Manual_de_usuario-spa.pdf
Parallel corpora and translation memories
The DGT Multilingual Translation Memory of the Acquis Communautaire: DGT-TM
The Acquis Communautaire is the entire body of European legislation, comprising all the treaties, regulations and directives adopted by the European Union (EU). Since each new country joining the EU is required to accept the whole Acquis Communautaire, this body of legislation has been translated into 22 official languages. As a result, the Acquis now exists as parallel texts in the following 22 languages: Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. For the 23rd official EU language, Irish, the Acquis is not translated on a regular basis; which is why DGT-TM does not include data in Irish. Read more at: http://langtech.jrc.ec.europa.eu/DGT-TM.html
We have extracted all language pairs involving English and Spanish and published as TMX and tab separated text at: http://lpg.uoc.edu/corpus/DGT