Trees | Index | Help |
---|
Module orchid :: Class NaiveAnalyzer |
|
object
--+ |_Verbose
--+ |Thread
--+ | NaiveAnalyzer
Malcontent
Method Summary | |
---|---|
Creates a new analyzer. | |
Adds links to the fetch queue. | |
Processes the site and adds it to the db. | |
Returns the number of sites this analyzer has processed | |
Returns a map which maps domain names to links inside the domain. | |
A real analyzer should override this method. | |
Performs the main function of the analyzer. | |
Chooses the next url to crawl to. | |
Sets the stop condition to the specified value. | |
Inherited from Thread | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Inherited from _Verbose | |
| |
Inherited from object | |
x.__delattr__('name') <==> del x.name | |
x.__getattribute__('name') <==> x.name | |
x.__hash__() <==> hash(x) | |
T.__new__(S, ...) -> a new object with type S, a subtype of T | |
helper for pickle | |
helper for pickle | |
x.__setattr__('name', value) <==> x.name = value | |
x.__str__() <==> str(x) |
Class Variable Summary | |
---|---|
Inherited from Thread | |
bool |
_Thread__initialized = False
|
Method Details |
---|
__init__(self,
linksToFetchAndCond,
siteQueueAndCond,
db)
Creates a new analyzer. There can be as many analyzers as you like,
depending on the type of processing of data you wish to do.
|
addSiteToFetchQueue(self, lfs)Adds links to the fetch queue. A real analyzer should override this method. |
analyzeSite(self, db, site)Processes the site and adds it to the db. Any real analyzer should override this method with it's own logic. |
getNumSitesProcessed(self)Returns the number of sites this analyzer has processed |
reorganizeByDomain(self, listOfLinks)Returns a map which maps domain names to links inside the domain. |
report(self)A real analyzer should override this method. Outputs the results of the analysis so far. |
run(self)Performs the main function of the analyzer. In this case, just adds all the hyperlinks to the toFetch queue.
|
selectNextUrl(self)Chooses the next url to crawl to. This implementation will select a random domain and then crawl to the first link in that domain's queue. |
setStopCondition(self, val)Sets the stop condition to the specified value. Should be True to stop the analyzer thread. |
Trees | Index | Help |
---|
Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 | http://epydoc.sf.net |