Part of spamfighter.core.model.thomas View Source View In Hierarchy
Method | __init__ | Undocumented |
Method | commit | Undocumented |
Method | newPool | Create a new pool, without actually doing any |
Method | removePool | Undocumented |
Method | renamePool | Undocumented |
Method | mergePools | Merge an existing pool into another. |
Method | poolData | Return a list of the (token, count) tuples. |
Method | poolTokens | Return a list of the tokens in this pool. |
Method | save | Undocumented |
Method | load | Undocumented |
Method | poolNames | Return a sorted list of Pool names. |
Method | buildCache | merges corpora and computes probabilities |
Method | poolProbs | Undocumented |
Method | getTokens | By default, we expect obj to be a screen and split |
Method | getProbs | extracts the probabilities of tokens in a message |
Method | train | Train Bayes by telling him that item belongs |
Method | untrain | Undocumented |
Method | trainedOn | Undocumented |
Method | guess | Undocumented |
Method | robinson | computes the probability of a message being spam (Robinson's method) |
Method | robinsonFisher | computes the probability of a message being spam (Robinson-Fisher method) |
Method | __repr__ | Undocumented |
Method | __len__ | Undocumented |
Method | _train | Undocumented |
Method | _untrain | Undocumented |
Note that this does not change the case. In some applications you may want to lowecase everthing so that "king" and "King" generate the same token.
Override this in your subclass for objects other than text.
Alternatively, you can pass in a tokenizer as part of instance creation.