Module malcontent :: Class Malcontent
[show private | hide private]
[frames | no frames]

Type Malcontent

object --+            
         |            
  _Verbose --+        
             |        
        Thread --+    
                 |    
     NaiveAnalyzer --+
                     |
                    Malcontent


This is a concrete analyzer which used together with the Orchid crawler to detect malicious web pages based on a given set of rules.
Method Summary
  __init__(self, linksToFetchAndCond, siteQueueAndCond, db, rules)
Creates a new malicious content analyzer.
  addSiteToFetchQueue(self, lfs)
Add the sites we extracted in analyzeSite to the "to fetch" queue.
  analyzeSite(self, db, site)
Applies all the available rules to the given site and extracts the links that we intend to crawl.
  report(self)
Logs the results of the crawl.
  selectNextUrl(self)
Select the next url to crawl to.
    Inherited from NaiveAnalyzer
  getNumSitesProcessed(self)
Returns the number of sites this analyzer has processed
  reorganizeByDomain(self, listOfLinks)
Returns a map which maps domain names to links inside the domain.
  run(self)
Performs the main function of the analyzer.
  setStopCondition(self, val)
Sets the stop condition to the specified value.
    Inherited from Thread
  __repr__(self)
  getName(self)
  isAlive(self)
  isDaemon(self)
  join(self, timeout)
  setDaemon(self, daemonic)
  setName(self, name)
  start(self)
    Inherited from object
  __delattr__(...)
x.__delattr__('name') <==> del x.name
  __getattribute__(...)
x.__getattribute__('name') <==> x.name
  __hash__(x)
x.__hash__() <==> hash(x)
  __new__(T, S, ...)
T.__new__(S, ...) -> a new object with type S, a subtype of T
  __reduce__(...)
helper for pickle
  __reduce_ex__(...)
helper for pickle
  __setattr__(...)
x.__setattr__('name', value) <==> x.name = value
  __str__(x)
x.__str__() <==> str(x)

Method Details

__init__(self, linksToFetchAndCond, siteQueueAndCond, db, rules)
(Constructor)

Creates a new malicious content analyzer.
Parameters:
rules - a list of Rule objects to be applied against crawled sites.
Overrides:
orchid.NaiveAnalyzer.__init__

addSiteToFetchQueue(self, lfs)

Add the sites we extracted in analyzeSite to the "to fetch" queue.
Overrides:
orchid.NaiveAnalyzer.addSiteToFetchQueue

analyzeSite(self, db, site)

Applies all the available rules to the given site and extracts the links that we intend to crawl. Currently we follow regular ('<a...'), frame, iframe and script links.
Overrides:
orchid.NaiveAnalyzer.analyzeSite

report(self)

Logs the results of the crawl.
Overrides:
orchid.NaiveAnalyzer.report

selectNextUrl(self)

Select the next url to crawl to. This is done by selecting a random domain and then taking one page from it's queue.
Overrides:
orchid.NaiveAnalyzer.selectNextUrl

Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 http://epydoc.sf.net