Module orchid :: Class OrchidController
[show private | hide private]
[frames | no frames]

Type OrchidController

object --+        
         |        
  _Verbose --+    
             |    
        Thread --+
                 |
                OrchidController


This class is responsible for controlling the fetchers and distributing the work load.
Method Summary
  __init__(self, linksToFetch, siteQueueAndCond, analyzer, numFetchers, maxFetches, socketTimeOut, delay)
Creates a new controller.
  getFetcherThreadUtilization(self)
Returns a list of number of urls each fetcher thread handler.
  getNumFetchersUsed(self)
Returns the number of fetchers that handled at least one url.
  run(self)
Runs this controller thread.
    Inherited from Thread
  __repr__(self)
  getName(self)
  isAlive(self)
  isDaemon(self)
  join(self, timeout)
  setDaemon(self, daemonic)
  setName(self, name)
  start(self)
    Inherited from object
  __delattr__(...)
x.__delattr__('name') <==> del x.name
  __getattribute__(...)
x.__getattribute__('name') <==> x.name
  __hash__(x)
x.__hash__() <==> hash(x)
  __new__(T, S, ...)
T.__new__(S, ...) -> a new object with type S, a subtype of T
  __reduce__(...)
helper for pickle
  __reduce_ex__(...)
helper for pickle
  __setattr__(...)
x.__setattr__('name', value) <==> x.name = value
  __str__(x)
x.__str__() <==> str(x)

Method Details

__init__(self, linksToFetch, siteQueueAndCond, analyzer, numFetchers=1, maxFetches=100, socketTimeOut=20, delay=5)
(Constructor)

Creates a new controller. Typically you will need only one.
Parameters:
linksToFetch - A tuple of the form (Dict, Condition) where the dict is the map of domains to links to be fetched (which is updated by the analyzer) and the condition is a threading.Condition object which is used to synchronize access to the list.
siteQueueAndCond - a tuple of the form (list, cond) where list is the site queue into which fetchers insert new sites that have been fetched. cond is a Condition object which is used to lock the queue.
analyzer - The analyzer which we are using for analyzing crawled data.
numFetchers - The number of active threads used for crawling.
maxFetches - Maximum number of pages to crawl.
socketTimeOut - The timeout to use for opening urls. WARNING: the timeout is set using socket.setdefaulttimeout(..) which affects ALL the sockets in the multiverse.
delay - The delay in seconds between assignments of urls to fetchers.
Overrides:
threading.Thread.__init__

getFetcherThreadUtilization(self)

Returns a list of number of urls each fetcher thread handler.

getNumFetchersUsed(self)

Returns the number of fetchers that handled at least one url.

run(self)

Runs this controller thread. The controller will use it's fetchers to fill the links and sites databases.
Overrides:
threading.Thread.run

Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 http://epydoc.sf.net