Trees | Index | Help |
---|
Module orchid :: Class OrchidExtractor |
|
Method Summary | |
---|---|
Creates a new link extractor. | |
Extracts all the links in the page according to the patterns specified in LINK_PATTERNS. | |
Returns a map from link type to a list of links of that type that appeared in the page. | |
Returns the BeautifulSoup datastructure of the HTML of the site that was set using setSite . | |
getRawContent(self)
| |
Sets the current site url and content for the extractor. |
Class Variable Summary | |
---|---|
list |
LINK_PATTERNS = [('regular', <_sre.SRE_Pattern object at...
|
Method Details |
---|
__init__(self)
Creates a new link extractor. Should be followed by a call to
setSite
|
extract(self)Extracts all the links in the page according to the patterns specified in LINK_PATTERNS. The links are stored in a map (link type -> url list) called links (accessible by 'extractor.links' where extractor is an instance of HtmlLinkExtractor) |
getLinks(self)Returns a map from link type to a list of links of that type that appeared in the page. |
getParsedContent(self)Returns the BeautifulSoup datastructure of the HTML of the site that was set using setSite . |
setSite(self, stringUrl, content)Sets the current site url and content for the extractor.
|
Class Variable Details |
---|
Trees | Index | Help |
---|
Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 | http://epydoc.sf.net |