dhtmlparser API

Most important function here is parseString(), which is used to process string and create Document Object Model.

dhtmlparser.parseString(txt, cip=True)[source]

Parse string txt and return DOM tree consisting of single linked HTMLElement.

Parameters:
  • txt (str) – HTML/XML string, which will be parsed to DOM.
  • cip (bool, default True) – Case Insensitive Parameters. Use special dictionary to store HTMLElement.params as case insensitive.
Returns:

Single conteiner HTML element with blank tag, which has whole DOM in it’s HTMLElement.childs property. This element can be queried using HTMLElement.find() functions.

Return type:

obj

dhtmlparser.makeDoubleLinked(dom, parent=None)[source]

Standard output from dhtmlparser is single-linked tree. This will make it double-linked.

Parameters:
  • dom (obj) – HTMLElement instance.
  • parent (obj, default None) – Don’t use this, it is used in recursive call.
dhtmlparser.removeTags(dom)[source]

Remove all tags from dom and obtain plaintext representation.

Parameters:dom (str, obj, array) – str, HTMLElement instance or array of elements.
Returns:Plain string without tags.
Return type:str

Table Of Contents

Previous topic

pyDHTMLParser

Next topic

HTMLElement class

This Page