|
|
Algorithms & Software for Knowledge Acquisition from Heterogeneous Distributed Data
Personnel
Dr. Vasant Honavar, Professor of Computer Science and of Bioinformatics and Computational Biology, Principal Investigator.
Dr. Drena Dobbs, Associate Professor of Molecular, Cell, and Developmental Biology, Co-Principal Investigator.
Dr. Doina Caragea, Research Associate, Computer Science. Focus: Algorithms for learning classifiers from heterogeneous data, Efficient extraction of sufficient statistics from heterogeneous data, theoretical framework for knowledge acquisition from heterogeneous, distributed, autonomous data.
Summary
Development of high throughput data acquisition technologies together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. However, the massive size, heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in extracting knowledge from this data. Honavar's research on this topic, supported in part by an Information Technology Research (ITR) grant from the National Science Foundation (0219699) and a graduate fellowship from IBM seeks to overcome these hurdles through the design, analysis, and implementation of:
- Efficient distributed and cumulative learning algorithms with provable performance guarantees (relative to their centralized or batch counterparts) for knowledge acquisition from distributed data sources
- Customizable information extraction agents that can effectively exploit domain or context-specific ontologies supplied by the users to extract the information needed for learning (e.g., sufficient statistics) from distributed data sources despite differences in query capabilities, interfaces, ontologies, and access restrictions to facilitate analysis of heterogeneous distributed data from different perspectives
- INDUS - a test-bed for knowledge acquisition from heterogeneous distributed data in computational molecular biology (e.g., characterization of protein sequence-structure-function relationships using diverse sources of biological data).
The resulting algorithms are being applied to representative data-driven knowledge discovery problems drawn from computational molecular biology.
more . . .
Funding
At present, primary source of funding for this project is:
This project has benefited from funding for related, but not overlapping work from other sources including:
-
Discovering Protein Sequence-Structure-Function Relationships, Biological Information Science and Technology Initiative, National Institutes of Health (2003-2007).
Vasant Honavar (PI), (with Drena Dobbs and Robert Jernigan),
$1,022,000.
-
Pioneer Hi-Bred Graduate Fellowships in Bioinformatics and Computational Biology.
Vasant Honavar (PI) (with doctoral students Adrian Silvescu and Carson Andorf). (2002-2004). $80,000.
-
IBM Doctoral Research Fellowship.
Vasant Honavar (with doctoral student Doina Caragea).
(2003-2004). $25,000.
In the past, some of the work leading up to this project was supported in part by:
Representative Publications
-
Bao, J., Hu, Z., Caragea, D., Reecy, J., and Honavar, V. A Tool for Collaborative Construction of Large Biological Ontologies. Fourth International Workshop on Biological Data Management (BIDM 2006), Krakov, Poland, IEEE Press. Vol. In press., Accepted, 2006.
-
Bao, J., Caragea, D., and Honavar, V. A Distributed Tableau Algorithm for Package-based Description Logics. Proceedings of the Second International Workshop on Context Representation and Reasoning (CRR 2006), Riva del Garda, Italy, CEUR. Vol. In press., Accepted, 2006.
- Bao, J., Caragea, D., and Honavar, V. Modular Ontologies - A Formal Investigation of Semantics and Expressivity. In Proceedings of the First Asian Semantic Web Conference, Beijing, China, Springer-Verlag. Vol. In press., Accepted, 2006.
- Kang, D-K., Silvescu, A. and Honavar, V. RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science., Berlin: Springer-Verlag. pp. 45-54, Accepted, 2006.
- Bao, J., Caragea, D., and Honavar, V. Towards Collaborative Environments for Ontology Construction and Sharing. Proceedings of the International Symposium on Collaborative Technologies and Systems., Las Vegas, 2006.
- J. Pathak, S. Basu, R. Lutz, and V. Honavar. MoSCoE: A Framework for Modeling Web Service Composition and Execution. IEEE Conference on Data Engineering Ph.D. Workshop, Atlanta, GA, 2006.
- Zhang, J., Kang, D-K., Silvescu, A. and Honavar, V. Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. Knowledge and Information Systems. Vol. 9. No. 2. pp. 157-179, 2006.
- Pathak, J, Yong, J. Honavar, V., McCalley, J. Condition Data Aggregation for Failure Mode Estimation of Power Transformers. Hawaii International Conference on Systems Sciences, IEEE Computer Society. pp. 241a, Accepted, 2006.
- Vasile, F., Silvescu, A., Kang, D-K., and Honavar, V. TRIPPER: An Attribute Value Taxonomy Guided Rule Learner. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Berlin: Springer-Verlag. pp. 55-59, 2006.
- Pathak, J,, Koul, N., Caragea, D., and Honavar, V. A Framework for Semantic Web Services Discovery. Proceedings of the 7th ACM International Workshop on Web Information and Data Management (WIDM 2005)., ACM Press. pp. 45-50, 2005.
- Yakhnenko, O., Silvescu, A., and Honavar, V. Discriminatively Trained Markov Model for Sequence Classification. IEEE Conference on Data Mining (ICDM 2005), Houston, Texas, IEEE Press, 2005.
-
Caragea, D., Zhang, J., Bao, J., Pathak, J., and Honavar, V. Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous Information Sources (Invited paper). Proceedings of the 16th International Conference on Algorithmic Learning Theory. Lecture Notes in Computer Science, Singapore, Berlin: Springer-Verlag. Vol. 3734. pp. 13-44, 2005.
- Zhang, J., Caragea, D. and Honavar, V. Learning Ontology-Aware Classifiers. Proceedings of the 8th International Conference on Discovery Science. Springer-Verlag Lecture Notes in Computer Science, Singapore, Berlin: Springer-Verlag. Vol. 3735. pp. 308-321, 2005.
- Caragea, D., Bao, J., Pathak, J., Andorf, C,., Dobbs, D., and Honavar, V. Information Integration from Semantically Heterogeneous Biological Data Sources. Proceedings of the Sixteenth International Workshop on Databases and Expert Systems Applications (DEXA 05), Copenhagen, IEEE Computer Society. pp. 580-584, 2005.
-
Kang, D-K., Fuller, D., and Honavar, V. Learning Misuse and Anomaly Detectors from System Call Frequency Vector Representation. IEEE International Conference on Intelligence and Security Informatics. Springer-Verlag Lecture Notes in Computer Science, Springer-Verlag. Vol. 3495. pp. 511-516, 2005.
- Kang, D-K., Zhang, J., Silvescu, A., and Honavar, V. Multinomial Event Model Based Abstraction for Sequence and Text Classification. Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005), Edinburgh, UK, Berlin: Springer-Verlag. Vol. 3607. pp. 134-148, 2005.
- Kang, D-K., Fuller, D., and Honavar, V. Learning Classifiers for Misuse and Anomaly Detection Using a Bag of System Calls Representation. Proceedings of the 6th IEEE Systems, Man, and Cybernetics Workshop (IAW 05), West Point, NY, IEEE. pp. 118-125, 2005.
- Caragea, D., Silvescu, A., Pathak, J., Bao, J., Andorf, C., Dobbs, D., and Honavar, V. Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources. Data Integration in Life Sciences (DILS 2005) Springer-Verlag Lecture Notes in Computer Science, San Diego, Berlin: Springer-Verlag. Vol. 3615. pp. 175-190, 2005.
- R. Polikar, L. Udpa, S. Udpa, and V. Honavar. An Incremental Learning Algorithm with Confidence Estimation for Automated Identification of NDE Signals. IEEE Transactions of Ultrasonics, Ferroelectrics, and Frequency Control. Vol. 51. pp. 990-1001, 2004.
- Zhang, J. and Honavar, V. Learning Compact and Accurate Classifiers from Attribute Value Taxonomies and Partially Specified Data. IEEE International Conference on Data Mining, IEEE Press. pp. 289-298, 2004.
- Bao, J. and Honavar, V. Collaborative Ontology Building With Wiki@nt. Third International Workshop on Evaluation of Ontology Building Tools, Hiroshima, 2004.
- Bao, J., Cao, Y., Tavanapong, W., and Honavar, V. Integration of Domain-Specific and Domain-Independent Ontologies for Colonoscopy Video Database Annotation. International Conference on Information and Knowledge Engineeringl (IKE 04), Las Vegas, Nevada, USA, CSREA Press. pp. 82-88, 2004.
- Caragea, D., Silvescu, A., and Honavar, V. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems. Vol. 1. No. 2. pp. 80-89, 2004.
- Yan, C., Dobbs, D., and Honavar, V. Identifying Protein-Protein Interaction Sites from Surface Residues . A Support Vector Machine Approach. Neural Computing Applications. Vol. 13. pp. 123-129, 2004.
- Kang, D-K., Silvescu, A., Zhang, J. and Honavar, V. Generation of Attribute Value Taxonomies from Data for Accurate and Compact Classifier Construction. IEEE International Conference on Data Mining, IEEE Press. pp. 130-137, 2004.
- Pathak, J., Caragea, D., and Honavar, V. Ontology-Extended Component-Based Workflows: A Framework for Constructing Complex Workflows from Semantically Heterogeneous Software Components. VLDB-04 Workshop on Semantic Web and Databases. Springer-Verlag Lecture Notes in Computer Science., Toronto, Springer-Verlag. Vol. 3372. pp. 41-56, 2004.
- Caragea, D., Pathak, J. and Honavar, V. Learning Classifiers from Semantically Heterogeneous Data. International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004). Springer-Verlag Lecture Notes in Computer Science, Cyprus, Greece, Springer-Verlag. Vol. 3291. pp. 963-980, 2004.
- Yan, C., Dobbs, D., and Honavar, V. A Two-Stage Classifier for Identification of Protein-Protein Interface Residues. Bioinformatics. Vol. 20. pp. i371-378, 2004.
- Atramentov, A., Leiva, H., and Honavar, V. (2003). A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag.
- Caragea, D., Reinoso-Castillo, J., Silvescu, A. (2003). Statistics Gathering for Information Integration on the Web. In: Proceedings of the IJCAI-03 Workshop on Information Integration on the Web..
- Reinoso-Castillo, J., Silvescu, A., Caragea, D., Pathak, J. and Honavar, V. (2003). Information Extraction and Integration from Heterogeneous, Distributed, Autonomous Information Sources: A Federated, Query-Centric Approach.. IEEE International Conference on Information Integration and Reuse.
- Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. In press.
- Reinoso-Castillo, J. (2002). Ontolgy-Driven Information Extraction and Integration from Autonomous, Heterogeneous, Distributed Data Sources -- A Federated Query-Centric Approach. Masters Thesis. Artificial Intelligence Research Laboratory. Department of Computer Science. Iowa State University.
- Zhang, J., Silvescu, A., and Honavar, V. (2002). Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction. In: Proceedings of Symposium on Abstraction, Reformulation, and Approximation. Berlin: Springer-Verlag.
back to top
|
CILD is housed in Atanasoff Hall on the Northwest side of campus.
|