Discovery of Protein Sequence Structure Function Relationships
GM066387
Research in Computational Biology seeks to develop algorithmic or information processing models of biological systems and processes such as genetic networks, protein folding, and protein-protein interaction. Research in bioinformatics is concerned with development of algorithms and software for organizing, processing, and analyzing experimental data and knowledge derived from the data to address specific questions in biological sciences. Honavar's current research in Bioinformatics and Computational Molecular Biology, is focused on development of computational tools for largescale collaborative data-driven knowledge discovery in biological sciences and the application of the resulting tools in data-driven exploration of macromolecular sequence-structure-expression-evolution-function relationships and inference of complex biological signalling networks and pathways. Much of this work is being carried out in collaboration with colleagues with expertise in molecular biology, biochemistry, genetics, and biophysics. This work is supported in part by a Biological Information Science and Technology Initiative (BISTI) award (GM066387) from the National Institutes of Health. Current research foci in this area include:
- Development of computational tools for interactive and collaborative data-driven knowledge discovery from disparate biological information sources including macromolecular sequences, structures, phylogenies, expression patterns. Of particular interest are algorithms and software for
- rapid and flexible ontology-guided information extraction from heterogeneous, distributed, autonomous information sources
- learning classifiers, associations, and clusters from distributed autonomous information sources
- learning compact and comprehensible classifiers from attribute value taxonomies, class taxonomies, and partially specified data
- learning attribute value taxonomies and class taxonomies from data
- learning classifiers from multi-relational data
- Data-driven exploration of macromolecular sequence-structure-expression-evolution-function relationships including
- Discovery of sequence correlates of protein function and protein-protein interaction
- Discovery of sequence correlates of functionally significant structural features of proteins
- Construction of classifiers for assigning protein sequences to structural or functional families
- Prediction of putative binding sites in proteins from sequence information