|
|
Frequency Based Learning (Frebal) for Naïve Bayes, NB k-gram, and NB(k)
GM066387
Frequency Based Learning (Frebal): Frebal is a stand alone algorithmic framework for learning on sequence data. The general concept is to use the probabilities of small local k-gram sequences (in the case of proteins, k consecutive amino acids) given a class to build classifiers to predict the given class. These probabilities can be estimated by the counts of the k-grams given a dataset. This demo version integrates two algorithms into this frame work. The first algorithm is NB k-gram. NB k-gram builds a Naïve Bayes classifier based on the k-grams. It assumes that these k-grams are independent based on position and ignores the dependencies caused by overlapping sequences. The second algorithm is NBk. NB k also builds a Naïve Bayes classifier based on the k-grams, but it takes into account the dependencies caused by overlapping sequences. Please note that using the value k=1 with either algorithm is equivalent to running a Naïve Bayes classifier. This framework can also be extended to other learning algorithms such as Support Vector Machines, Nearest Neighbor, Decision Trees, Artificial Neural Networks, etc. The downloadable version comes built in with five datasets (three based on Gene Ontology labels, and two based on subcellular localization data).
|
|