Collaborative Research: Learning Classifiers From Autonomous, Semantically Heterogeneous, Distributed Data
Personnel Project Summary Funding Publications Software Other Projects ISU Artificial Intelligence Research Lab Center for Computational Intelligence, Learning, and Discovery
- Dr. Vasant Honavar, Professor of Computer Science and of Bioinformatics and Computational Biology, Principal Investigator (Iowa State University)
- Dr. Doina Caragea, Assistant professor of Computing and Information Sciences, Principal Investigator, Kansas State University.
-
Cornelia Caragea, Ph.D. Student, Computer Science, Iowa State University. Focus: Learning predictive models from multi-relational data, with applications in bioinformatics.
- Oksana Yakhnenko, Ph.D. Student, Computer Science, Iowa State University. Focus: Learning predictive models from multimodal data.
- Neeraj Koul, Ph.D. Student and Research Programmer, Computer Science, Iowa State University. Focus: Design and open source implementatation of algorithms for learning predictive models from autonomous, semantically heterogeneous, distributed data.
- George Voutsadakis, Assistant Professor of Mathematics and Computer Science, Lake Superior State University; Ph.D. Student, Computer Science, Iowa State University. Focus: Distributed Ontologies, Contextslized Logics, Information Integration, Reasoning.
Advances in networks, sensors, storage, computing, and high throughput data acquisition, have led to a proliferation of autonomous, distributed data sources in many areas of human activity. New discoveries in biological, physical, and social sciences and engineering are being driven by our ability to discover, share, integrate and analyze disparate types of data. Statistically-based machine learning algorithms offer some of the most cost-effective approaches to discovery of experimentally testable predictive models and hypotheses from data. However, the large size, distributed nature, and autonomy of the data sources (and the attendant differences in access, queries allowed, processing capabilities, structure, organization, and underlying data models and data semantics) present hurdles to effective utilization of machine learning. This research aims to overcome these hurdles by developing efficient, resource-aware distributed algorithms and software services to support collaborative, integrative knowledge acquisition such a setting. The research team will implement, deploy, and evaluate the resulting algorithms using benchmark data sets, associated data models and ontologies, and user-specified inter-ontology mappings on a distributed test-bed of networked databases and services at Iowa State University and Kansas State University. The resulting open-source software can potentially transform collaborative e-science in the same way that Web has transformed information sharing. Broader impacts of this research include enhanced opportunities for research-based training of graduate and undergraduate students, interdisciplinary collaborations, participation of under-represented groups, and development of increasingly sophisticated software to support collaborative, integrative e-science.
At present, the primary source of funding for this project is:
Additional support for the project has come from:
-
Center for Computational Intelligence, Learning, and Discovery, Iowa State University.
The project has benefited from work supported by related, but non-overlapping grants including:
- Exploratory Investigation of Modular Ontologies Vasant Honavar (PI), Giora Slutzki (Co-PI), and Doina Caragea (Co-PI), (2006-2008). $112,000.
- Discovering Protein Sequence-Structure-Function Relationships, Biological Information Science and Technology Initiative, National Institutes of Health (2003-2007). Vasant Honavar (PI), (with Drena Dobbs and Robert Jernigan), $1,022,000.
- Interactive and Verifiable Composition of Web Services To Satisfy End User Goals, Vasant Honavar (Co-PI), Samik Basu (PI), Robyn Lutz (co-PI). National Science Foundation (2007-2010). $335,002.
- IGERT: Computational Biology Training Group. Drena Dobbs (PI), Vasant Honavar, Desh Ranjan, Mary O'Connell, Daniel Voytas, Susan Carpenter (Co-PIs). National Science Foundation (2005-2010). $2,968,976.
This work builds on the results of a NSF-supported ITR project:
- ITR: Algorithms and Software for Knowledge Acquisition from Heterogeneous Distributed Data. Vasant Honavar (PI) Drena Dobbs (Co-PI). National Science Foundation (2002-2006). $223,750. Project website.
Publications
-
Bao, J., Voutsadakis, G., Slutzki, G., and Honavar, V. On the Decidability of Role Mappings between Modular Ontologies. In: Proceedings of the 23nd Conference on Artificial Intelligence (AAAI-2008), Chicago, USA, AAAI, In press.
-
Bao, J., Slutzki, G., and Honavar, V. (2008). P-DL: A Semantic Importing Approach to Selective Knowledge Reuse in Modular Ontologies. In: Ontology Modularization. Parent, C., Spaccapietra, S., and Stuckenschmidt, H. (Ed). Berlin: Springer. To appear.
-
Caragea, D. and Honavar, V. (2008). Learning Classifiers from Distributed Data. In: Encyclopedia of Database Technologies and Applications,
Ferraggine, V.E., Doorn, J.H., and Rivero, L.C. (Ed). New York: Idea Group. In press.
-
Caragea, D. and Honavar, V. (2008). Learning Classifiers from Semantically Heterogeneous Data. In: Encyclopedia of Data Warehousing and Mining, Wang, J. (ed). To appear.
-
Honavar, V. and Caragea, D. (2008). Towards a Semantics-Enabled Infrastructure for Knowledge Acquisition from Distributed Data. In: Next Generation Data Mining. Kargupta, H. et al. (ed). CRC Press. In press.
-
Pathak, J., Basu, S., Honavar, V. (2008). Assembling Composite Web Services from Autonomous Components. In: Emerging Artificial Intelligence Applications in Computer Engineering, Maglogiannis, I., Karpouzis, K., and Soldatos, J. (ed). IOS Press. In press.
-
Pathak, J., Basu, S., Lutz, R., and Honavar, V. (2008).
MoSCoE: An Approach for Composing Web Services through Iterative Reformulation of Functional Specifications. International Journal of Artificial Intelligence Tools, Vol. 17. No. 1. pp. 109-138, 2008.
-
Pathak, J., Basu, S., and Honavar, V. (2008). Composing Web Services through Automatic Reformulation of Service Specifications. IEEE International Conference on Services Computing, IEEE. In press.
-
Andorf, C., Dobbs, D. and Honavar, V. (2007). Exploring Inconsistencies in Genome Wide Protein Function Annotations: A Machine Learning Approach. BMC Bioinformatics 8:284 doi:10.1186/1471-2105-8-284
-
Bao, J., Slutzki, G., and Honavar, V. (2007). A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies.. In: Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-2007). Vancouver, Canada. Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies. pp. 1304-1309. AAAI Press.
-
Bao, J., Slutzki, G., and Honavar, V. (2007). Privacy-Preserving Reasoning on the Semantic Web. IEEE/WIC/ACM Conference on Web Intelligence. IEEE. pp. 791-797
-
Caragea, C., Sinapov, J., Dobbs, D., and Honavar, V. (2007). Assessing the Performance of Macromolecular Sequence Classifiers, In: Proceedings of the IEEE Conference on Bioinformatics and Bioengineering (BIBE 2007). pp. 320-326, 2007.
- Caragea, C., Sinapov, J., Silvescu, A., Dobbs, D. And Honavar, V. (2007). Glycosylation Site Prediction Using Ensembles of Support Vector Machine Classifiers. BMC Bioinformatics. doi:10.1186/1471-2105-8-438.
-
Bao, J., Caragea, D., and Honavar, V. (2006).
On the Semantics of Linking and Importing in Modular Ontologies.In: Proceedings of the International Semantic Web Conference (ISWC 2006), Lecture Notes in Computer Science, Berlin: Springer. Lecture Notes in Computer Science Vol. 4273, pp. 72-86.
-
Bao, J., Caragea, D., and Honavar, V. (2006). A Tableau Based Federated Reasoning Algorithm for Modular Ontologies. In: Proceedings of the ACM/IEEE/WIC Conference on Web Intelligence. IEEE Press. pp. 404-410.
-
Bao, J., Caragea, D., and Honavar, V. Modular Ontologies - A Formal Investigation of Semantics and Expressivity. In Proceedings of the First Asian Semantic Web Conference, Beijing, China, Springer-Verlag Lecture Notes in Computer Science Vol. 4185, pp. 616-631. Best Paper Award, 2006.
-
Kang, D-K., Silvescu, A. and Honavar, V. RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science., Berlin: Springer-Verlag. pp. 45-54, 2006.
- Vasile, F., Silvescu, A., Kang, D-K., and Honavar, V. TRIPPER: An Attribute Value Taxonomy Guided Rule Learner. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Berlin: Springer-Verlag. pp. 55-59, 2006.
-
Zhang, J., Kang, D-K., Silvescu, A. and Honavar, V. Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. Knowledge and Information Systems. Vol. 9. No. 2. pp. 157-179, 2006.
-
Caragea, D., Zhang, J., Bao, J., Pathak, J., and Honavar, V. (2005). Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous Information Sources (Invited paper). In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. Lecture Notes in Computer Science. Singapore. Vol. 3734. pp. 13-44. Berlin: Springer-Verlag.
-
Caragea, D., Silvescu, A., Pathak, J., Bao, J., Andorf, C., Dobbs, D., and Honavar, V. (2005). Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources. In: Data Integration in Life Sciences (DILS 2005) Springer-Verlag Lecture Notes in Computer Science. San Diego. Vol. 3615. pp. 175-190. Berlin: Springer-Verlag.
-
Caragea, D., Bao, J., Pathak, J., Andorf, C,., Dobbs, D., and Honavar, V. Information Integration from Semantically Heterogeneous Biological Data Sources. Proceedings of the Sixteenth International Workshop on Databases and Expert Systems Applications (DEXA 05), Copenhagen, IEEE Computer Society. pp. 580-584, 2005.
-
Kang, D-K., Zhang, J., Silvescu, A., and Honavar, V. Multinomial Event Model Based Abstraction for Sequence and Text Classification. Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005), Edinburgh, UK, Berlin: Springer-Verlag. Vol. 3607. pp. 134-148, 2005.
-
Yakhnenko, O., Silvescu, A., and Honavar, V. Discriminatively Trained Markov Model for Sequence Classification. IEEE Conference on Data Mining (ICDM 2005), Houston, Texas, IEEE Press, 2005.
- Zhang, J., Caragea, D. and Honavar, V. (2005). Learning Ontology-Aware Classifiers. In: Proceedings of the 8th International Conference on Discovery Science. Springer-Verlag Lecture Notes in Computer Science. Singapore. Vol. 3735. pp. 308-321. Berlin: Springer-Verlag.
-
Caragea, D., Pathak, J., and Honavar, V. (2004). Learning Classifiers from Semantically Heterogeneous Data. In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004), Agia Napa, Cyprus, 2004.
-
Caragea, D., Pathak, J. and Honavar, V. (2004). Learning Classifiers from Semantically Heterogeneous Data. In: International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004). Springer-Verlag Lecture Notes in Computer Science. Cyprus, Greece. Vol. 3291. pp. 963-980. Springer-Verlag.
-
Caragea, D., Silvescu, A., and Honavar, V. (2004). A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems. Vol. 1. pp. 80-89.
-
Kang, D-K., Silvescu, A., Zhang, J., and Honavar, V. (2004). Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers. In: Proceedings of the IEEE International Conference on Data Mining.
-
Zhang, J. and Honavar, V. (2004). AVT-NBL - An Algorithm for Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. In: Proceedings of the IEEE International Conference on Data Mining.
-
Atramentov, A., Leiva, H., and Honavar, V. (2003).
A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag.
-
Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. In press.
-
Zhang, J., Silvescu, A., and Honavar, V. (2002). Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction. In: Proceedings of Symposium on Abstraction, Reformulation, and Approximation. Berlin: Springer-Verlag.
- Caragea, D., Silvescu, A., and Honavar, V. (2001). Invited Chapter. Towards a Theoretical Framework for Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources. In: Emerging Neural Architectures Based on Neuroscience. Berlin: Springer-Verlag.
Software
- INDUS -- a prototype system for flexible information extraction and integration using user-supplied ontologies from heterogeneous, distributed, autonomous information sources. To be made available for download.
- INDUS-DM -- open source suite of learning algorithms which decouple data source dependent and data source independent components of learning using sufficient statistics. To be made available for download. Anticipated date of release: August 2008.
- AVT-DTL -- software for learning decision tree classifiers from attribute value taxonomies and data and some sample data sets and attribute value taxonomies are available for download.
- AVT-NBL -- software for learning decision naive bayes classifiers from attribute value taxonomies and data and some sample data sets and attribute value taxonomies are available for download.