Iowa State University

Iowa State University

 

Center for

Computational Intelligence, Learning, & Discovery

 

 

 

Data-Driven Discovery of Macromolecular Sequence Structure Function Interaction Expression Relationships

 

Personnel

Dr. Vasant Honavar, Professor of Computer Science and of Bioinformatics and Computational Biology, Principal Investigator

Dr. Drena Dobbs, Associate Professor of Molecular, Cell, and Developmental Biology, Co-Principal Investigator.

Dr. Robert Jernigan , Professor, Dept. of Biochemistry, Biophysics, & Molecular Biology , Co-Principal Investigator.

Summary

Research in Computational Biology seeks to develop algorithmic or information processing models of biological systems and processes such as genetic networks, protein folding, and protein-protein interaction. Research in bioinformatics is concerned with development of algorithms and software for organizing, processing, and analyzing experimental data and knowledge derived from the data to address specific questions in biological sciences. Honavar's current research in Bioinformatics and Computational Molecular Biology, is focused on development of computational tools for largescale collaborative data-driven knowledge discovery in biological sciences and the application of the resulting tools in data-driven exploration of macromolecular sequence-structure-expression-evolution-function relationships and inference of complex biological signalling networks and pathways. Much of this work is being carried out in collaboration with colleagues with expertise in molecular biology, biochemistry, genetics, and biophysics. This work is supported in part by an Information Technology Research (ITR) grant (0219699) from the National Science Foundation, an Integrative Graduate Education and Research Training (IGERT) award in Computational Molecular Biology (09972653) from the National Science Foundation, a Biological Information Science and Technology Initiative (BISTI) award (GM066387) from the National Institutes of Health and graduate fellowships from Pioneer Hi-Bred and IBM. Current research foci in this area include:

  • Development of computational tools for interactive and collaborative data-driven knowledge discovery from disparate biological information sources including macromolecular sequences, structures, phylogenies, expression patterns. Of particular interest are algorithms and software for
    • rapid and flexible ontology-guided information extraction from heterogeneous, distributed, autonomous information sources
    • learning classifiers, associations, and clusters from distributed autonomous information sources
    • learning compact and comprehensible classifiers from attribute value taxonomies, class taxonomies, and partially specified data
    • learning attribute value taxonomies and class taxonomies from data
    • learning classifiers from multi-relational data
  • Data-driven exploration of macromolecular sequence-structure-expression-evolution-function relationships including
    • Discovery of sequence correlates of protein function and protein-protein interaction
    • Discovery of sequence correlates of functionally significant structural features of proteins
    • Construction of classifiers for assigning protein sequences to structural or functional families
    • Prediction of putative binding sites in proteins from sequence information
  • Data-driven inference of genetic networks, metabolic networks, and signaling pathways including
    • Discovery of coexpressed or coregulated genes from gene expression patterns
    • Construction of genetic networks from gene expression data
    • Analysis of gene expression patterns in specific systems (e.g., onset of photosynthesis, retinal development)
    • Modeling and simulation of complex genetic networks, metabolic networks, and signaling pathways

Funding

This work is supported in part by an Information Technology Research (ITR) grant (0219699) from the National Science Foundation, an Integrative Graduate Education and Research Training (IGERT) award in Computational Molecular Biology (09972653) from the National Science Foundation, a Biological Information Science and Technology Initiative (BISTI) award (GM066387) from the National Institutes of Health and graduate fellowships from Pioneer Hi-Bred and IBM.

Representative Publications

  1. Yan, C., Terribilini, M., , Wu, F., Jernigan, R.L., Dobbs, D. and Honavar, V. Identifying amino acid residues involved in protein-DNA interactions from sequence. BMC Bioinformatics, 2006.
  2. Terribilini, M., Lee, J.-H., Yan, C., Jernigan, R. L., Honavar, V. and Dobbs, D. Predicting RNA-binding Sites from Amino Acid Sequence. RNA Journal.. Vol. In press, Accepted, 2006.
  3. Bao, J., Hu, Z., Caragea, D., Reecy, J., and Honavar, V. A Tool for Collaborative Construction of Large Biological Ontologies. Fourth International Workshop on Biological Data Management (BIDM 2006), Krakov, Poland, IEEE Press. Vol. In press., Accepted, 2006.
  4. Bao, J., Caragea, D., and Honavar, V. A Distributed Tableau Algorithm for Package-based Description Logics. Proceedings of the Second International Workshop on Context Representation and Reasoning (CRR 2006), Riva del Garda, Italy, CEUR. Vol. In press., Accepted, 2006.
  5. Bao, J., Caragea, D., and Honavar, V. Modular Ontologies - A Formal Investigation of Semantics and Expressivity. In Proceedings of the First Asian Semantic Web Conference, Beijing, China, Springer-Verlag. Vol. In press., Accepted, 2006.
  6. Kang, D-K., Silvescu, A. and Honavar, V. RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science., Berlin: Springer-Verlag. pp. 45-54, Accepted, 2006.
  7. Bao, J., Caragea, D., and Honavar, V. Towards Collaborative Environments for Ontology Construction and Sharing. Proceedings of the International Symposium on Collaborative Technologies and Systems., Las Vegas, 2006.
  8. Zhang, J., Kang, D-K., Silvescu, A. and Honavar, V. Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. Knowledge and Information Systems. Vol. 9. No. 2. pp. 157-179, 2006.
  9. Terribilini, M., Lee. J-H., Yan, C., Carpenter, S., Jernigan, R., Honavar, V. and Dobbs, D. Identifying interaction sites in recalcitrant proteins: predicted protein and rna binding sites in HIV-1 and EIAV agree with experimental data. Pacific Symposium on Biocomputing, Hawaii, World Scientific. Vol. 11. pp. 415-426, 2006.
  10. Wu, F., Olson, B., Dobbs, D., and Honavar, V. Using Kernel Methods to Predict Protein-Protein Interaction Sites from Sequence. IEEE Joint Conference on Neural Networks, Vancouver, Canada, IEEE Press. Vol. In press., Accepted, 2006.
  11. Pathak, J,, Koul, N., Caragea, D., and Honavar, V. A Framework for Semantic Web Services Discovery. Proceedings of the 7th ACM International Workshop on Web Information and Data Management (WIDM 2005)., ACM Press. pp. 45-50, 2005.
  12. Yakhnenko, O., Silvescu, A., and Honavar, V. Discriminatively Trained Markov Model for Sequence Classification. IEEE Conference on Data Mining (ICDM 2005), Houston, Texas, IEEE Press, 2005.
  13. Caragea, D., Zhang, J., Bao, J., Pathak, J., and Honavar, V. Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous Information Sources (Invited paper). Proceedings of the 16th International Conference on Algorithmic Learning Theory. Lecture Notes in Computer Science, Singapore, Berlin: Springer-Verlag. Vol. 3734. pp. 13-44, 2005.
  14. Zhang, J., Caragea, D. and Honavar, V. Learning Ontology-Aware Classifiers. Proceedings of the 8th International Conference on Discovery Science. Springer-Verlag Lecture Notes in Computer Science, Singapore, Berlin: Springer-Verlag. Vol. 3735. pp. 308-321, 2005.
  15. Caragea, D., Bao, J., Pathak, J., Andorf, C,., Dobbs, D., and Honavar, V. Information Integration from Semantically Heterogeneous Biological Data Sources. Proceedings of the Sixteenth International Workshop on Databases and Expert Systems Applications (DEXA 05), Copenhagen, IEEE Computer Society. pp. 580-584, 2005.
  16. Wu. F., Zhang, J., and Honavar, V. Learning Classifiers Using Hierarchically Structured Class Taxonomies. Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005), Edinburgh, Berlin, Springer-Verlag. Vol. 3607. pp. 313-320, 2005.
  17. Caragea, D., Silvescu, A., Pathak, J., Bao, J., Andorf, C., Dobbs, D., and Honavar, V. Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources. Data Integration in Life Sciences (DILS 2005) Springer-Verlag Lecture Notes in Computer Science, San Diego, Berlin: Springer-Verlag. Vol. 3615. pp. 175-190, 2005.
  18. Sen, T.Z., Kloczkowski, A., Jernigan, R.L., Yan, C., Honavar, V., Ho, K-M., Wang, C-Z., Ihm, Y., Cao, H., Gu, X., and Dobbs, D. Predicting Binding Sites of Protease-Inhibitor Complexes by Combining Multiple Methods. BMC Bioinformatics. Vol. 5. pp. 205, 2004.
  19. Zhang, J. and Honavar, V. Learning Compact and Accurate Classifiers from Attribute Value Taxonomies and Partially Specified Data. IEEE International Conference on Data Mining, IEEE Press. pp. 289-298, 2004.
  20. Bao, J. and Honavar, V. Collaborative Ontology Building With Wiki@nt. Third International Workshop on Evaluation of Ontology Building Tools, Hiroshima, 2004.
  21. Yan, C., Dobbs, D., and Honavar, V. A Two-Stage Classifier for Identification of Protein-Protein Interface Residues. Bioinformatics. Vol. 20. pp. i371-378, 2004.
  22. Bao, J., Cao, Y., Tavanapong, W., and Honavar, V. Integration of Domain-Specific and Domain-Independent Ontologies for Colonoscopy Video Database Annotation. International Conference on Information and Knowledge Engineeringl (IKE 04), Las Vegas, Nevada, USA, CSREA Press. pp. 82-88, 2004.
  23. Caragea, D., Silvescu, A., and Honavar, V. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems. Vol. 1. No. 2. pp. 80-89, 2004.
  24. Yan, C., Dobbs, D., and Honavar, V. Identifying Protein-Protein Interaction Sites from Surface Residues . A Support Vector Machine Approach. Neural Computing Applications. Vol. 13. pp. 123-129, 2004.
  25. Andorf, C., Silvescu, A., Dobbs, D. and Honavar, V. Learning Classifiers for Assigning Protein Sequences to Gene Ontology Functional Families. Fifth International Conference on Knowledge Based Computer Systems (KBCS 2004), India, New Delhi, India: Allied Publishers. pp. 256-255, 2004.
  26. Pathak, J., Caragea, D., and Honavar, V. Ontology-Extended Component-Based Workflows: A Framework for Constructing Complex Workflows from Semantically Heterogeneous Software Components. VLDB-04 Workshop on Semantic Web and Databases. Springer-Verlag Lecture Notes in Computer Science., Toronto, Springer-Verlag. Vol. 3372. pp. 41-56, 2004.
  27. Caragea, D., Pathak, J. and Honavar, V. Learning Classifiers from Semantically Heterogeneous Data. International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004). Springer-Verlag Lecture Notes in Computer Science, Cyprus, Greece, Springer-Verlag. Vol. 3291. pp. 963-980, 2004.
  28. Atramentov, A., Leiva, H., and Honavar, V. (2003). A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag. In press.
  29. Caragea, D., Silvescu, A., and Honavar, V. (2003). Decision Tree Induction from Distributed, Heterogeneous, Autonomous Data Sources. In: Proceedings of the Conference on Intelligent Systems Design and Applications (ISDA 03).
  30. Caragea, D., Reinoso-Castillo, J., Silvescu, A. (2003). Statistics Gathering for Information Integration on the Web. In: Proceedings of the IJCAI-03 Workshop on Information Integration on the Web..
  31. Reinoso-Castillo, J., Silvescu, A., Caragea, D., Pathak, J. and Honavar, V. (2003). Information Extraction and Integration from Heterogeneous, Distributed, Autonomous Information Sources: A Federated, Query-Centric Approach.. IEEE International Conference on Information Integration and Reuse. To appear.
  32. Wang, X., Schroeder, D., Dobbs, D., and Honavar, V. (2003). Automated Data-Driven Discovery of Motif-Based Protein Function Classifiers. Information Sciences. In press.
  33. Yan, C., Dobbs, D., and Honavar, V. (2003). Identification of Surface Residues Involved in Protein-Protein Interaction -- A Support Vector Machine ApproachIn: Proceedings of the Conference on Intelligent Systems Design and Applications (ISDA-03). Tulsa, Oklahoma. 2003.
  34. Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC.

 

project website

 

 

 

 

Atanasoff Hall

CILD is housed in Atanasoff Hall on the Northwest side of campus.

 

 

Center for Computational Intelligence, Learning, & Discovery
214 Atanasoff Hall
Ames, IA 50011-1041

Phone: (515)294-9074
Fax:    (515)294-0258