jump to searchjump to navigationjump to content

Publications - Bioorganic Chemistry

Sort by: Year Type of publication

Displaying results 1 to 5 of 5.

Publications

Hinneburg, A.; Egert, B.; Porzel, A.; Duplicate detection of 2D-NMR Spectra J. Integr. Bioinformatics 4, 64-80, (2007) DOI: 10.1515/jib-2007-53

2D-Nuclear magnetic resonance (NMR) spectra are used in the (structural) analysis of small molecules. In contrast to 1D-NMR spectra, 2D-NMR spectra correlate the chemical shifts of 1H and 13C at the same time. A spectrum consists of several peaks in a two--dimensional space. The most important information of a peak is the location of its center, which captures the bonding relationships of hydrogen and carbon atoms. A spectrum contains much information about the chemical structure of a product, but in most cases the structure cannot be read off in a simple and straightforward manner. Structure elucidation involves a considerable amount (manual) efforts.Using high-field NMR spectrometers, many 2D-NMR spectra can be recorded in short time. So the common situation is that a lab or company has a repository of 2D-NMR spectra, partially annotated with the structural information. For the remaining spectra the structure in unknown. In case two research labs are collaborating, the repositories will be merged and annotations shared.We reduce that problem to the task of finding duplicates in a given set of 2D-NMR spectra. Therefore, we propose a simple but robust definition of 2D-NMR duplicates, which allows for small measurement errors. We give a quadratic algorithm for the problem, which can be implemented in SQL. Further, we analyze a more abstract class of heuristics, which are based on selecting particular peaks. Such a heuristic works as a filter step on the pairs of possible duplicates and allows false positives. We compare all methods with respect to their run time. Finally we discuss the effectiveness of the duplicate definition on real data.
Books and chapters

Hinneburg, A.; Gabriel, H.-H.; Gohr, A.; Bayesian Folding-In with Dirichlet Kernels for PLSI (2007) DOI: 10.1109/ICDM.2007.15

Probabilistic latent semantic indexing (PLSI) represents documents of a collection as mixture proportions of latent topics, which are learned from the collection by an expectation maximization (EM) algorithm. New documents or queries need to be folded into the latent topic space by a simplified version of the EM-algorithm. During PLSI- Folding-in of a new document, the topic mixtures of the known documents are ignored. This may lead to a suboptimal model of the extended collection. Our new approach incorporates the topic mixtures of the known documents in a Bayesian way during folding- in. That knowledge is modeled as prior distribution over the topic simplex using a kernel density estimate of Dirichlet kernels. We demonstrate the advantages of the new Bayesian folding-in using real text data.
Books and chapters

Hinneburg, A.; Porzel, A.; Wolfram, K.; An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra Lecture Notes in Computer Science 4414, 424-438, (2007) ISBN: 978-3-540-71233-6 DOI: 10.1007/978-3-540-71233-6_33

Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring substances is an important task to investigate new potentially useful chemical compounds. Multi-dimensional NMR-spectra are relational objects like documents, but consists of continuous multi-dimensional points called peaks instead of words. We develop several mappings from continuous NMR-spectra to discrete text-like data. With the help of those mappings any text retrieval method can be applied. We evaluate the performance of two retrieval methods, namely the standard vector space model and probabilistic latent semantic indexing (PLSI). PLSI learns hidden topics in the data, which is in case of 2D-NMR data interesting in its owns rights. Additionally, we develop and evaluate a simple direct similarity function, which can detect duplicates of NMR-spectra. Our experiments show that the vector space model as well as PLSI, which are both designed for text data created by humans, can effectively handle the mapped NMR-data originating from natural products. Additionally, PLSI is able to find meaningful ”topics” in the NMR-data.
Books and chapters

Wolfram, K.; Porzel, A.; Hinneburg, A.; Similarity Search for Multi-dimensional NMR-Spectra of Natural Products Lecture Notes in Computer Science 4213, 650-658, (2006) ISBN: 978-3-540-46048-0 DOI: 10.1007/11871637_67

Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring products is an important task to investigate new potentially useful chemical compounds. We develop a set-based similarity function, which, however, does not sufficiently capture more abstract aspects of similarity. NMR-spectra are like documents, but consists of continuous multi-dimensional points instead of words. Probabilistic semantic indexing (PLSI) is an retrieval method, which learns hidden topics. We develop several mappings from continuous NMR-spectra to discrete text-like data. The new mappings include redundancies into the discrete data, which proofs helpful for the PLSI-model used afterwards. Our experiments show that PLSI, which is designed for text data created by humans, can effectively handle the mapped NMR-data originating from natural products. Additionally, PLSI combined with the new mappings is able to find meaningful ”topics” in the NMR-data.
Publications

Hinneburg, A.; Keim, D. A.; Brandt, W.; Clustering 3D-structures of small amino acid chains for detecting dependences from their sequential context in proteins Proc. IEEE International Symposium on Bio-Informatics and Biomedical Engineering 43-49, (2000) DOI: 10.1109/BIBE.2000.889588

In the past, a good number of rotamer libraries have been published, which allow a deeper understanding of the conformational behavior of amino acid residues in proteins. Since the number of available high-resolution X-ray protein structures has grown significantly over the last years, a more comprehensive analysis of the conformational behavior is possible today. In this paper, we present a method to compile a new class of rotamer libraries for detecting interesting relationships between residue conformations and their sequential context in proteins. The method is based on a new algorithm for clustering residue conformations. To demonstrate the effectiveness of our method, we apply our algorithm to a library consisting of all 8000 tripeptide fragments formed by the 20 native amino acids. The analysis shows some very interesting new results, namely that some specific tripeptide fragments show some unexpected conformation of residues instead of the highly preferred conformation. In the neighborhood of two asparagine residues, for example, threonine avoids the conformation which is most likely to occur otherwise. The new insights obtained by the analysis are important in understanding the formation and prediction of secondary structure elements and will consequently be crucial for improving the state-of-the-art of protein folding.
IPB Mainnav Search