Iowa State University

Iowa State University

 

Center for

Computational Intelligence, Learning, & Discovery

 

 

 

Algorithms and Software for Learning Classifiers from Attribute Value Taxonomies and Data

 

Personnel

Dr. Vasant Honavar, Professor of Computer Science and of Bioinformatics and Computational Biology, Principal Investigator

 

Summary

In many applications of machine learning, data consist of partially specified instances, that is, instances whose attribute values, class labels, or both, are specified at different levels of abstraction. Such partially specified data are simply unavoidable when information from multiple, semantically heterogeneous data sources. The main scientific objective of the proposed research is to design, implement, analyze, and apply machine learning algorithms for construction of pattern classifiers from such partially specified data sets using user-supplied attribute value taxonomies (AVT), class taxonomies (CT), or both as needed. Anticipated results of the proposed research include:

  • A collection of well-documented data sets and associated AVT and CT, drawn from diverse application domains including computational molecular biology, security informatics, and census data, and a suite of programs for generating synthetic data sets, AVT, CT
  • A suite of theoretically well-founded algorithms for learning classifiers from partially specified data – including AVT and CT-guided variants of existing algorithms for construction of Decision trees, Bag-of-Words classifiers, Bayesian networks (e.g., Naïve Bayes and Tree-Augmented Naïve Bayes Classifiers), Hyperplane Classifiers (Perceptrons, Winnow Perceptron, and Support Vector Machines)
  • Experimental and theoretical characterization of the resulting algorithms for learning classifiers from partially specified data based on systematic exploration of such algorithms along several important dimensions including the characteristics of the algorithms, data, AVT, and CT, and performance criteria (e.g. complexity, accuracy, and comprehensibility of the resulting classifiers) using several synthetic as well as real-world data sets drawn from diverse application domains in which the PI has active ongoing collaborations
  • A well-documented, modular, extensible, open source suite of software tools for learning from partially specified data and user-supplied attribute value taxonomies, class taxonomies, or both.

Results to date include novel algorithms for learning compact and accurate Naïve Bayes classifiers and decision tree classifiers from partially specified data (Zhang, Silvescu and Honavar, 2002; Zhang and Honavar, 2003; Zhang and Honavar, 2004). The long-term goals of this research are to advance the state of the art in machine learning and broaden the range of applications of machine learning in automated knowledge discovery to domains where partially specified data are commonplace.

Representative Publications

  1. Zhang, J., Silvescu, A., and Honavar, V. (2002). Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction. In: Proceedings of Symposium on Abstraction, Reformulation, and Approximation. Berlin: Springer-Verlag

  2. Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. In press.


 

 

 

 

 

 

 

 

Center for Computational Intelligence, Learning, & Discovery
214 Atanasoff Hall
Ames, IA 50011-1041

Phone: (515)294-9074
Fax:    (515)294-0258