Unser 10. Leibniz Plant Biochemistry Symposium am 7. und 8. Mai war ein großer Erfolg. Thematisch ging es in diesem Jahr um neue Methoden und Forschungsansätze der Naturstoffchemie. Die exzellenten Vorträge über Wirkstoffe…
Omanische Heilpflanze im Fokus der Phytochemie IPB-Wissenschaftler und Partner aus Dhofar haben jüngst die omanische Heilpflanze Terminalia dhofarica unter die phytochemische Lupe genommen. Die Pflanze ist reich an…
Geschmack ist vorhersagbar: Mit FlavorMiner. FlavorMiner heißt das Tool, das IPB-Chemiker und Partner aus Kolumbien jüngst entwickelt haben. Das Programm kann, basierend auf maschinellem Lernen (KI), anhand der…
2D-Nuclear magnetic resonance (NMR) spectra are used in the (structural) analysis of small molecules. In contrast to 1D-NMR spectra, 2D-NMR spectra correlate the chemical shifts of 1H and 13C at the same time. A spectrum consists of several peaks in a two--dimensional space. The most important information of a peak is the location of its center, which captures the bonding relationships of hydrogen and carbon atoms. A spectrum contains much information about the chemical structure of a product, but in most cases the structure cannot be read off in a simple and straightforward manner. Structure elucidation involves a considerable amount (manual) efforts.Using high-field NMR spectrometers, many 2D-NMR spectra can be recorded in short time. So the common situation is that a lab or company has a repository of 2D-NMR spectra, partially annotated with the structural information. For the remaining spectra the structure in unknown. In case two research labs are collaborating, the repositories will be merged and annotations shared.We reduce that problem to the task of finding duplicates in a given set of 2D-NMR spectra. Therefore, we propose a simple but robust definition of 2D-NMR duplicates, which allows for small measurement errors. We give a quadratic algorithm for the problem, which can be implemented in SQL. Further, we analyze a more abstract class of heuristics, which are based on selecting particular peaks. Such a heuristic works as a filter step on the pairs of possible duplicates and allows false positives. We compare all methods with respect to their run time. Finally we discuss the effectiveness of the duplicate definition on real data.
Publikation
Gaida, A.; Neumann, S.;MetHouse: Raw and Preprocessed Mass Spectrometry DataJ. Integr. Bioinformatics4107-114(2007)DOI: 10.1515/jib-2007-56
We are developing a vendor-independent archive and on top of that a data warehouse for mass spectrometry metabolomics data. The archive schema resembles the communitydeveloped object model, the Java implementation of the model classes, and an editor (for both mzData XML files and the database) have been generated using the Eclipse Modeling Framework. Persistence is handled by the JDO2 -compliant framework JPOX. The main content of the Data Warehouse are the results of the signal processing and peak-picking tasks, carried out using the XCMS package from Bioconductor, putative identification and mass decomposition are added to the warehouse afterwards.We present the system architecture, current content, performance observations and describe the analysis tools on top of the warehouse.Availability: http://msbi.ipb-halle.de/