Introducción
En esta entrada explicaremos cómo transformar una memoria de traducción que esté en formato de texto tabulado a tmx utilizando tikal de Okapi Tools (http://okapi.opentag.com/). Con las herramientas de Okapi se pueden realizar infinidad de tareas relacionadas con la traducción y la localización, entre ellas la tarea que nos ocupa.
Recordemos que una memoria de traducción en formato de texto tabulado tiene el siguiente aspecto:
segmento_lengua_A tabulador segmento_lengua_B.
Si la lengua A es inglés y la lengua B es español este mismo segmento en TMX tendría el siguiente aspecto:

Esta transformación nos permitirá utilizar las memorias de traducción en herramientas de traducción asisitida.
Transformación con Tikal
Si ejecutamos tikal sin ningún parámetro adicioal nos muestra todas sus instrucciones:

Si nos fijamos bien (haciendo clic sobre la imagen se ampliará) existe una opción -2tmx que nos servirá para realizar la conversión a este formato. Si miramos bien los parámetros que nos pedirá veremos lo siguiente:

Además de indicar la lengua de partida (-sl) y la de llegada (-tl) será muy importante dar el valor adecuado al parámetro -fc que es el parámetro que controla el formato del ficher de entrada. Tenemos que darle un valor de entre la lista dfe valores posibles. Esta lista la podemos obtener escribiendo:
tikal -listconf
Que nos devolverá la lista de configuraciones:
D:\okapi>tikal -listconf
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.0.23
-------------------------------------------------------------------------------
List of all filter configurations available:
- okf_txml = Wordfast Pro TXML documents
- okf_txml-fillEmptyTargets = Wordfast Pro TXML documents with empty targets fi
lled on output.
- okf_itshtml5 = Configuration for standard HTML5 documents.
- okf_doxygen = Doxygen-commented Text Documents
- okf_wiki = Text with wiki-style markup
- okf_mosestext = Default Moses Text configuration.
- okf_tradosrtf = Configuration for Trados-tagged RTF files - READING ONLY.
- okf_rainbowkit = Configuration for Rainbow translation kit.
- okf_rainbowkit-package = Configuration for Rainbow translation kit package.
- okf_rainbowkit-noprompt = Configuration for Rainbow translation kit (without
prompt).
- okf_mif = Adobe FrameMaker MIF documents
- okf_archive = Configuration for archive files
- okf_transifex = Transifex project with prompt when starting
- okf_transifex-noPrompt = Transifex project without prompt when starting
- okf_xini = Configuration for XINI documents from ONTRAM
- okf_xini-noOutputSegmentation = Configuration for XINI documents from ONTRAM
(fields in the output are not segmented)
- okf_xliff = Configuration for XML Localisation Interchange File Format (XLIFF
) documents.
- okf_openxml = Microsoft Office documents (DOCX, XLSX, PPTX).
- okf_openoffice = OpenOffice.org ODT, ODS, ODP, ODG, OTT, OTS, OTP, OTG docume
nts
- okf_simplification = Configuration for extracting resources from an XML file.
Resources and then codes are simplified.
- okf_simplification-xmlResources = Configuration for extracting resources from
an XML file. Resources are simplified.
- okf_simplification-xmlCodes = Configuration for extracting resources from an
XML file. Codes are simplified.
- okf_properties = Java properties files (Output used \uHHHH escapes)
- okf_properties-outputNotEscaped = Java properties files (Characters in the ou
tput encoding are not escaped)
- okf_properties-skypeLang = Skype language properties files (including support
for HTML codes)
- okf_properties-html-subfilter = Java Property content processed by an HTML su
bfilter
- okf_dtd = Configuration for XML DTD documents (entities content)
- okf_html = HTML or XHTML documents
- okf_html-wellFormed = XHTML and well-formed HTML documents
- okf_po = Standard bilingual PO files
- okf_po-monolingual = Monolingual PO files (msgid is a real ID, not the source
text).
- okf_regex = Default Regex configuration.
- okf_regex-srt = Configuration for SRT (Sub-Rip Text) sub-titles files.
- okf_regex-textLine = Configuration for text files where each line is a text u
nit
- okf_regex-textBlock = Configuration for text files where text units are separ
ated by 2 or more line-breaks.
- okf_regex-macStrings = Configuration for Macintosh .strings files.
- okf_ts = Configuration for Qt TS files.
- okf_tmx = Configuration for Translation Memory eXchange (TMX) documents.
- okf_xml = Configuration for generic XML documents (default ITS rules).
- okf_xml-resx = Configuration for Microsoft RESX documents (without binary dat
a).
- okf_xml-MozillaRDF = Configuration for Mozilla RDF documents.
- okf_xml-JavaProperties = Configuration for Java Properties files in XML.
- okf_xml-AndroidStrings = Configuration for Android Strings XML documents.
- okf_xml-WixLocalization = Configuration for WiX (Windows Installer XML) Local
ization files.
- okf_idml = Adobe InDesign IDML documents
- okf_json = Configuration for JSON files
- okf_phpcontent = Default PHP Content configuration.
- okf_ttx = Configuration for Trados TTX documents.
- okf_pensieve = Configuration for Pensieve translation memories.
- okf_vignette = Default Vignette Export/Import Content configuration.
- okf_vignette-nocdata = Vignette files without CDATA sections.
- okf_railsyaml = Ruby on Rails YAML files
- okf_xmlstream = Large XML Documents
- okf_xmlstream-dita = DITA XML
- okf_xmlstream-JavaPropertiesHTML = Java Properties XML with Embedded HTML
- okf_versifiedtxt = Versified Text Documents
- okf_table = Table-like files such as tab-delimited, CSV, fixed-width columns,
etc.
- okf_table_csv = Comma-separated values, optional header with field names.
- okf_table_catkeys = Haiku CatKeys resource files
- okf_table_src-tab-trg = 2-column (source + target), tab separated files.
- okf_table_fwc = Fixed-width columns table padded with white-spaces.
- okf_table_tsv = Columns, separated by one or more tabs.
- okf_plaintext = Plain text files.
- okf_plaintext_trim_trail = Text files; trailing spaces and tabs removed from
extracted lines.
- okf_plaintext_trim_all = Text files; leading and trailing spaces and tabs rem
oved from extracted lines.
- okf_plaintext_paragraphs = Text files extracted by paragraphs (separated by 1
or more empty lines).
- okf_plaintext_spliced_backslash = Spliced lines filter with the backslash cha
racter (\) used as the splicer.
- okf_plaintext_spliced_underscore = Spliced lines filter with the underscore c
haracter (_) used as the splicer.
- okf_plaintext_spliced_custom = Spliced lines filter with a user-defined splic
er.
- okf_plaintext_regex_lines = Plain Text Filter using regex-based linebreak sea
rch. Extracts by lines.
- okf_plaintext_regex_paragraphs = Plain Text Filter using regex-based linebrea
k search. Extracts by paragraphs.
- okf_odf = XML OpenDocument files (e.g. use inside OpenOffice.org documents).
La que nos interesa es:
- okf_table_csv = Comma-separated values, optional header with field names.
Así pues, para realizar la conversión tenemos que escribir (si el fichero a transformar se llama corpus-ONU-eng-spa.txt):
tikal -2tmx corpus-ONU-eng-spa.txt -sl en -tl es -fc okf_table_src-tab-trg
Y el sistema realizará el proceso de conversió y escribirá:
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.0.23
-------------------------------------------------------------------------------
Conversion to TMX
Source language: en
Target language: es
Default input encoding: windows-1252
Filter configuration: okf_table_src-tab-trg
Output: corpus-ONU-eng-spa.txt.tmx
Input: /D:/okapi/corpus-ONU-eng-spa.txt
Done in 3.568s
El fichero transformado se llama corpus-ONU-eng-spa.txt.tmx