E-Mail (intern) | News | Events | Open positions | Invitation to Bid | Contact | Sitemap | Imprint
Molecular Signal Processing
Director of the department
Prof. Steffen Abel
Bioorganic Chemistry
Director of the department
Prof. Ludger Wessjohann
Stress and Developmental Biology
Director of the department
Prof. Dierk Scheel
Cell and Metabolic Biology
Director of the department
Prof. Alain Tissier
home  >  Stress and Developmental Biology  >  Bioinformatics & Mass Spectrometry  >  Research Projects
Information to the department
Central Facilities
Events
2012-02-08 09:00 - Amina Msonga
Development of transgenic fungi that kill human malaria parasites in mosquitoes. Fang et al. (2011) Science ... view...
Information
Research Group

Research Projects


Bioinformatics & Mass Spectrometry


Research Projects
Thesis and Project offersStaff membersPublications

Our projects cover the various stages in a bioinformatics and metabolomics pipeline.

For the software developments we use different methods, such as the statistics environment R and various Bioconductor packages. Where applicable, we create and such packages for the wider scientific community. Other projects use Java, and the possibility to add user friendly web based interfaces. Finally, workflow system such as the Taverna project allow to integrate heterogenous modules into a comprehensive pipeline.

Compute intensive calculations are executed on the IPB cluster, which provides a number of local compute nodes, but also allows to move tasks into a public cloud where necessary.

 

The first step in a metabolomics data processing pipeline is the processing of signals, to reduce complex chromatographic data into peak lists, and align several peak lists from different samples into a data matrix. We are co-maintaining the successful Bioconductor package XCMS, as well as several other packages.

 

The annotation of these peak lists can be performed with our CAMERA package. CAMERA uses several hints (intensity correlation within and across samples, reliable mass differences, etc.) to collect mass signals originating from the same metabolite. Combinations of known mass rules allow to annotate the ion species such as adduct, fragments, ion clusters. This step is a prerequisite to obtain the actual mass or elemental composition of the neutral molecule.

The next step in the identification of the metabolites is the determination of the elemental composition. In collaboration with the group of Prof. Böcker (Friedrich Schiller University, Jena) we created the Rdisop package.

All Bioconductor packages can be combined for both general and specific analysis tasks, and have an excellent integration with the remaining metabolomics-related tools and algorithms in Bioconductor, such as interfaces to KEGG and PubChem.

 

 

The statistical analysis of Metabolomics experiments will reveal a number of "interesting" metabolites. For any further biological interpretation, it is a requirement to identify their structure.

Mass spectrometry is a key technology for the identification of small molecules.
Today, the identification of metabolites from mass spectra relies on the comparison
with authentic compounds or reference spectra.

The IPB Halle is member of the MassBank consortium, the first open database of reference spectra. We are hosting the European MassBank node at http://msbi.ipb-halle.de/MassBank/ and develop an ecosystem of tools and workflows around this reference library.

 

 

 

Because reference spectra are often often expensive (both in consumables and chemicals, but also in manpower) to obtain, reference libraries will never be covering as many compounds as can be found in e.g. PubChem. Therefore, we are developing in-silico methods such as http://msbi.ipb-halle.de/MetFrag/ to identify compounds with tandem mass spectrometry among candidate structures obtained from general purpose compound libraries. Because a mechanistic simulation of the process is computationally infeasible,
we develop simplified in-silico fragmentation methods, statistical models and apply
machine learning to a large set of training spectra.

 

Storage and processing of mass spectrometry and metabolomics data can not be performed with simple text formats or spread sheets. The complexity of the underlying data and requirements of data exchange and future-proof archival mandate of thorough data model.

The exchange format  mzData has been developed in the context of the HUPO and PSI communities, and several conversion tools exist to create mzData from mass spectrometry instruments and other file formats such as mzXML. Since then, the developer communities of both mzData and mzXML collaborated to develop the joined successor mzML.

Currently, a data format for the description of multiple reaction monitoring (MRM) and tandem mass spectrometry (MS/MS) is under development in the TraML community.

 

 

 

Once the data is in a vendor independent machine readable format, the next step is to publish various -omics data in a well annotated format, according to community accepted (at least minimal) information about the experiment. The ISA tools (Investigation, Study, Assay) and format are designed for this task. We use deploy these tools, and develop the integration into our lab routine to be able to use them also as an (albeit very simple) LIMS (Laboratory Information System) system.

The dataset E200 for the metabolomics experiment described in this paper is available for download.


back  |  Print  |  to top
Research