Extract the original document from sdlxliff

Home About Contact

A Python script to extract the original source file from an SDLXLIFF file

Sometimes clients forget to send you the source document for reference purposes.
The python script presented below extracts the original source document from the sdlxliff file. This script is inspired by the script published by Thomas van Nellen which was brought to my attention by Hans Lenting
The improvement consist in the fact that the original source file is extracted to its actual name, which is also embeded in the sdxliff file, and not a random name, as the original script does. It is also extracted from the zip file, which the script first extracts from the sdlxliff file.

Prerequisites:

To run this script you will of course need Python 3. If it is not installed in your system, install it, and add it to the PATH environment variable (There should be a setup option that does it). Then for the script to work you will need to add the beautifulsoup module to Python. To do it open the command prompt or terminal window on your system, and type the following, and then press Enter:

pip install beautifulsoup4 lxml

If pip (the installer that installs extra modules in Python) is not installed on your system, you will need to find out how to add it, and add it in order to add the required module. I keep this page short on purpose, and you will not regret learning something new in the process. When the required module is installed in Python, you can run the script. Put the script file in the same folder as the sdlxliff file. Open the terminal window on the folder where both these files are, and type the following, and then press Enter:

python decodebs.py file.sdlxliff

Use the actual file name instead of 'file.sdlxliff'. You can add more files as parameters, separated by spaces, all will be processed by the script. After the script runs successfully you will find the original source file and a zip file in the same folder. The file is stored internally as a zip in the sdlxliff file, don't ask me why.
Download the Python script here. It is provided in a zip file, because some browsers will treat the script as dangerous content and will try to block the download.
If you're curious how the script works, open it in a text editor, for example Notepad on Windows, and look at the lines that begin with the # character. These are comments about the Python code.