Module orchid :: Class OrchidFetcher
[show private | hide private]
[frames | no frames]

Type OrchidFetcher

object --+        
         |        
  _Verbose --+    
             |    
        Thread --+
                 |
                OrchidFetcher


This class is responsible for fetching url contents, processing them with UgrahExtractor and updating the site and link database.
Method Summary
  __init__(self, siteQueue, siteQueueCond, fetcherCondition, stopConditionLock)
Creates a new fetcher thread (not started) with the following
  getUrlsCounter(self)
Returns the number of URLs this fetcher has handled.
  isFree(self)
Returns True if the fetcher hasn't been assigned a URL yet.
  run(self)
Performs the main function of the fetcher which is to fetch the contents of the url specified by setCurrentStringUrl.
  setStopCondition(self, val)
Can receive either True or False.
  setUrl(self, stringUrl)
Sets the url that the fetcher should work on.
  __fileData(self, s, links)
Stores the given site and links in the databases
  __processSite(self)
Fetches the url contents and creates a parsed structure
    Inherited from Thread
  __repr__(self)
  getName(self)
  isAlive(self)
  isDaemon(self)
  join(self, timeout)
  setDaemon(self, daemonic)
  setName(self, name)
  start(self)
  _set_daemon(self)
    Inherited from _Verbose
  _note(self, format, *args)
    Inherited from object
  __delattr__(...)
x.__delattr__('name') <==> del x.name
  __getattribute__(...)
x.__getattribute__('name') <==> x.name
  __hash__(x)
x.__hash__() <==> hash(x)
  __new__(T, S, ...)
T.__new__(S, ...) -> a new object with type S, a subtype of T
  __reduce__(...)
helper for pickle
  __reduce_ex__(...)
helper for pickle
  __setattr__(...)
x.__setattr__('name', value) <==> x.name = value
  __str__(x)
x.__str__() <==> str(x)

Class Variable Summary
    Inherited from Thread
bool _Thread__initialized = False

Method Details

__init__(self, siteQueue, siteQueueCond, fetcherCondition, stopConditionLock)
(Constructor)

Creates a new fetcher thread (not started) with the following
Parameters:
siteQueue - the site queue from which the analyzer takes sites to analyze.
siteQueueCond - A Condition object used to lock the siteQueue.
fetcherCondition - a threading.Condition object which is used for communication between the fetcher and the controller: whenever a fetcher finishes working on it's assignment it calls fetcherCondition.wait() and waits until the controller assigns a new url for it to fetch.
stopConditionLock - a threading.Lock object which is used to lock the internal stop condition variable. A thread that wishes to change this variable should lock it first.
Overrides:
threading.Thread.__init__

getUrlsCounter(self)

Returns the number of URLs this fetcher has handled. Should be called only AFTER the thread is dead.

isFree(self)

Returns True if the fetcher hasn't been assigned a URL yet.

run(self)

Performs the main function of the fetcher which is to fetch the contents of the url specified by setCurrentStringUrl. This method loops until the stop condition is set.
Overrides:
threading.Thread.run

setStopCondition(self, val)

Can receive either True or False. Set to Ture when the fetcher should stop working. WARNING: It's *necessary* to acquire the lock which was passed to the constructor as stopConditionLock before calling this method.

setUrl(self, stringUrl)

Sets the url that the fetcher should work on. It's *necessary* to acquire the condition instance which was passed to the constructor as fetcherCondition before calling this method and call notify afterwards

__fileData(self, s, links)

Stores the given site and links in the databases

__processSite(self)

Fetches the url contents and creates a parsed structure

Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 http://epydoc.sf.net