Module orchid
[show private | hide private]
[frames | no frames]

Module orchid

This package is a multi-threaded generic web crawler. For more detail about the package please see the attached paper.
Classes
NaiveAnalyzer This is an interface for the analyzers.
Orchid The main class of the crawler.
OrchidController This class is responsible for controlling the fetchers and distributing the work load.
OrchidExtractor A class responsible for parsing and analyzing html content and extracting various forms of links from it.
OrchidFetcher This class is responsible for fetching url contents, processing them with UgrahExtractor and updating the site and link database.
Site A class for representing the information that is collected for a specific site.
UrlHandler A class responsible for parsing a url and retrieving it's contents.

Function Summary
  extractServerName(stringUrl)
Extracts the domain name from a string URL and returns it.

Function Details

extractServerName(stringUrl)

Extracts the domain name from a string URL and returns it.

Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 http://epydoc.sf.net