benjamin.computer

UTOPIA Bio-informatics Toolkit

21-12-2014

UTOPIA Bio-informatics Toolkit

21st of December, 2014

My first ever job was to work on the UTOPIA code-base at the same university I did my undergraduate degree at - The University of Manchester. I worked on converting lovely Linux based C++ into something Windows could run. 10 years later, I'm back on the Project, making it work for the three main operating systems. The code has been sitting idle for quite a while and there is a need, more than ever apparently, for sequence alignment tools and protein folding visualisations. I'm certainly not a geneticist or biologist but the general principles are quite interesting.

UTOPIA is a collection of two programs and set of plugins written in Python. Cinema is a sequence alignment tool and Ambrosia is a 3D visualiation program. The two are designed to work together; one can visualise a particular section of a protein and analyse the 3D structure of certain elements. Both programs can load python plugins to import different sequences and alignments, allowing researchers to work with different on-line databases (of which there are many I've found).

utopia

The code-base is largely C++ with some Python. The Python library and it's C bindings are used together so the entire C++ back-end can be accessed via Python. Using a python allows quick changes for when on-line services such as RCSB and similar. The user interface components are written with Qt as oppose to a native back-end, which has pros and cons. Qt and Python have both moved on since UTOPIA was first made, so it was high-time it was looked after.

In order to tackle this code-base, dividing up into each logical component seemed like the best move. The UTOPIA Library itself, the Python back-end, the two major programs, their Internet components etc. To manage the build system, we decided that CMake was the best option. I'm a huge fan of CMake as it can generate Visual Studio project files as well as Ninja build files and OSX Make files. It can also do quite a good job of packaging things up for the different operating systems.

The CMake files for UTOPIA are quite involved; almost as much as some of the code bases. Packaging for OSX proved somewhat difficult; the use of '''install-name-tool''' proved somewhat tricky. Packaging for OSX can be a little fraught because of all the paths and library particularities but all these can be overcome with CMake and a little bash. Rather than use an installer, the Windows and OSX versions are packaged as stand-alone applications. Windows simply has all the libraries included in the same directory whereas OSX contains all the libs inside its '''.app''' folder.

alpha blending fail

Packaging for Linux proves a little more intensive. We decided to support '''rpm''' and '''deb''' as the main two packages. To make this work I included a lot of Bash scripts that will build the installers for you. In order to test them, I made use of a few virtual machines using the awesome VirtualBox. We decided to support Fedora 20 and Ubuntu 14.04 upwards, and use the 64 bit versions of all the libraries. So if you are on a 32 Bit system, you may have to wait a while :P

Rather than use Python3, we decided to upgrade to Python2.7 for now. Some of the libraries we depend on are still not available for Python3 but eventually, they will probably be upgraded. Packaging python is something I decided against doing; Linux has its package dependencies and OSX has Python installed by default. Only Windows causes problems, not least because of it's lack of a Python Debug library. Creating a debug build under Windows is somewhat fraught, unless you build the Debug library yourself.

sequences

UTOPIA deals with the concept of analyzing DNA sequences for various proteins. Sections of these sequences have different actions and correspond to physical structures that cause the protein to 'fold' in a certain way. Understanding why proteins fold and what similar proteins might do is a massive part of current research. If you know the action of one protein and you see another that looks similar and folds in the same way under a certain circumstance, maybe that circumstance will work for the first protein? Working this out computationally would lead to a massive combinatorial explosion, but creating the right tools for researchers is a valid and useful alternative.

Projects like this tend to be hard-work; you've got quite a lot of code to work through and it's somewhat tricky getting your head around someone else's code. You can begin to see the different styles of the different programmers who worked on the project. It can be quite interesting at times. There is a very nice 'feel good' feeling that you are writing code for a useful purpose, for researchers who are trying to help others. You can download the utopia toolkit from http://utopia.section9.co.uk


benjamin.computer