Class s.c.m.t.Bayes(object):

Part of spamfighter.core.model.thomas View Source View In Hierarchy

No class docstring
Method __init__ Undocumented
Method commit Undocumented
Method newPool Create a new pool, without actually doing any
Method removePool Undocumented
Method renamePool Undocumented
Method mergePools Merge an existing pool into another.
Method poolData Return a list of the (token, count) tuples.
Method poolTokens Return a list of the tokens in this pool.
Method save Undocumented
Method load Undocumented
Method poolNames Return a sorted list of Pool names.
Method buildCache merges corpora and computes probabilities
Method poolProbs Undocumented
Method getTokens By default, we expect obj to be a screen and split
Method getProbs extracts the probabilities of tokens in a message
Method train Train Bayes by telling him that item belongs
Method untrain Undocumented
Method trainedOn Undocumented
Method guess Undocumented
Method robinson computes the probability of a message being spam (Robinson's method)
Method robinsonFisher computes the probability of a message being spam (Robinson-Fisher method)
Method __repr__ Undocumented
Method __len__ Undocumented
Method _train Undocumented
Method _untrain Undocumented
def __init__(self, tokenizer=None, combiner=None, dataClass=None): (source)
Undocumented
def commit(self): (source)
Undocumented
def newPool(self, poolName): (source)
Create a new pool, without actually doing any training.
def removePool(self, poolName): (source)
Undocumented
def renamePool(self, poolName, newName): (source)
Undocumented
def mergePools(self, destPool, sourcePool): (source)
Merge an existing pool into another. The data from sourcePool is merged into destPool. The arguments are the names of the pools to be merged. The pool named sourcePool is left in tact and you may want to call removePool() to get rid of it.
def poolData(self, poolName): (source)
Return a list of the (token, count) tuples.
def poolTokens(self, poolName): (source)
Return a list of the tokens in this pool.
def save(self, fname='bayesdata.dat'): (source)
Undocumented
def load(self, fname='bayesdata.dat'): (source)
Undocumented
def poolNames(self): (source)
Return a sorted list of Pool names. Does not include the system pool '__Corpus__'.
def buildCache(self): (source)
merges corpora and computes probabilities
def poolProbs(self): (source)
Undocumented
def getTokens(self, obj): (source)
By default, we expect obj to be a screen and split it on whitespace.

Note that this does not change the case. In some applications you may want to lowecase everthing so that "king" and "King" generate the same token.

Override this in your subclass for objects other than text.

Alternatively, you can pass in a tokenizer as part of instance creation.

def getProbs(self, pool, words): (source)
extracts the probabilities of tokens in a message
def train(self, pool, item, uid=None): (source)
Train Bayes by telling him that item belongs in pool. uid is optional and may be used to uniquely identify the item that is being trained on.
def untrain(self, pool, item, uid=None): (source)
Undocumented
def _train(self, pool, tokens): (source)
Undocumented
def _untrain(self, pool, tokens): (source)
Undocumented
def trainedOn(self, msg): (source)
Undocumented
def guess(self, msg): (source)
Undocumented
def robinson(self, probs, ignore): (source)
computes the probability of a message being spam (Robinson's method) P = 1 - prod(1-p)^(1/n) Q = 1 - prod(p)^(1/n) S = (1 + (P-Q)/(P+Q)) / 2 Courtesy of http://christophe.delord.free.fr/en/index.html
def robinsonFisher(self, probs, ignore): (source)
computes the probability of a message being spam (Robinson-Fisher method) H = C-1( -2.ln(prod(p)), 2*n ) S = C-1( -2.ln(prod(1-p)), 2*n ) I = (1 + H - S) / 2 Courtesy of http://christophe.delord.free.fr/en/index.html
def __repr__(self): (source)
Undocumented
def __len__(self): (source)
Undocumented
API Documentation for SpamFighter, generated by pydoctor at 2009-02-27 11:58:37.