This class can be used for parsing or for creating DOM manually.
If you want to create DOM from HTMLElements, you can use one of theese four constructors:
HTMLElement()
HTMLElement("<tag>")
HTMLElement("<tag>", {"param": "value"})
HTMLElement("tag", {"param": "value"}, [HTMLElement("<tag1>"), ...])
Tag or parameter specification parts can be omitted:
HTMLElement("<root>", [HTMLElement("<tag1>"), ...])
HTMLElement(
[HTMLElement("<tag1>"), ...]
)
>>> from dhtmlparser import HTMLElement
>>> e = HTMLElement()
>>> e
<dhtmlparser.HTMLElement instance at 0x7fb2b39ca170>
>>> print e
>>>
Usually, it is better to use HTMLElement("").
>>> e = HTMLElement("<br>")
>>> e.isNonPairTag()
True
>>> e.isOpeningTag()
False
>>> print e
<br>
Notice, that closing tag wasn’t automatically created.
>>> e = HTMLElement("<tag>")
>>> e.isOpeningTag() # this doesn't check if tag actually is paired, just if it looks like opening tag
True
>>> e.isPairTag() # this does check if element is actually paired
False
>>> e.endtag = HTMLElement("</tag>")
>>> e.isOpeningTag()
True
>>> e.isPairTag()
True
>>> print e
<tag></tag>
In short:
>>> e = HTMLElement("<tag>")
>>> e.endtag = HTMLElement("</tag>")
Or you can always use string parser:
>>> e = d.parseString("<tag></tag>")
>>> print e
<tag></tag>
But don’t forget, that elements returned from parseString() are encapsulated in blank “root” tag:
>>> e = d.parseString("<tag></tag>")
>>> e.getTagName()
''
>>> e.childs[0].tagToString()
'<tag>'
>>> e.childs[0].endtag.tagToString() # referenced thru .endtag property
>>> e.childs[1].tagToString() # manually selected entag from childs - don't use this
'</tag>'
'</tag>
Tag (with or without <>) can have as dictionary as second parameter.
>>> e = HTMLElement("tag", {"param":"value"}) # without <>, because normal text can't have parameters
>>> print e
<tag param="value">
>>> print e.params # parameters are accessed thru .params property
{'param': 'value'}
You can create content manually:
>>> e = HTMLElement("<tag>")
>>> e.childs.append(HTMLElement("content"))
>>> e.endtag = HTMLElement("</tag>")
>>> print e
<tag>content</tag>
But there is also easier way:
>>> print HTMLElement("tag", [HTMLElement("content")])
<tag>content</tag>
or:
>>> print HTMLElement("tag", {"some": "parameter"}, [HTMLElement("content")])
<tag some="parameter">content</tag>
HTMLElement class used in DOM representation.
List of non-pair tags. Set this to blank list, if you wish to parse XML.
This class is used to represent single linked DOM (see makeDoubleLinked() for double linked).
list
List of child nodes.
dict
SpecialDict instance holding tag parameters.
obj
Reference to the ending HTMLElement or None.
obj
Reference to the openning HTMLElement or None.
Same as findAll(), but without endtags.
You can always get them from endtag property.
Same as findAllB(), but without endtags.
You can always get them from endtag property.
Search for elements by their parameters using Depth-first algorithm.
Parameters: |
|
---|---|
Returns: | List of HTMLElement instances matching your criteria. |
Return type: | list |
Simple search engine using Breadth-first algorithm.
Parameters: |
|
---|---|
Returns: | List of HTMLElement instances matching your criteria. |
Return type: | list |
This methods works same as find(), but only in one level of the childs.
This allows to chain wfind() calls:
>>> dom = dhtmlparser.parseString('''
... <root>
... <some>
... <something>
... <xe id="wanted xe" />
... </something>
... <something>
... asd
... </something>
... <xe id="another xe" />
... </some>
... <some>
... else
... <xe id="yet another xe" />
... </some>
... </root>
... ''')
>>> xe = dom.wfind("root").wfind("some").wfind("something").find("xe")
>>> xe
[<dhtmlparser.htmlelement.HTMLElement object at 0x8a979ac>]
>>> str(xe[0])
'<xe id="wanted xe" />'
Parameters: |
|
---|---|
Returns: | Blank HTMLElement with all matches in childs property. |
Return type: | obj |
Note
Returned element also have set _container property to True.
wfind() is nice function, but still kinda long to use, because you have to manually chain all calls together and in the end, you get HTMLElement instance container.
This function recursively calls wfind() for you and in the end, you get list of matching elements:
xe = dom.match("root", "some", "something", "xe")
is alternative to:
xe = dom.wfind("root").wfind("some").wfind("something").wfind("xe")
You can use all arguments used in wfind():
dom = dhtmlparser.parseString('''
<root>
<div id="1">
<div id="5">
<xe id="wanted xe" />
</div>
<div id="10">
<xe id="another wanted xe" />
</div>
<xe id="another xe" />
</div>
<div id="2">
<div id="20">
<xe id="last wanted xe" />
</div>
</div>
</root>
''')
xe = dom.match(
"root",
{"tag_name": "div", "params": {"id": "1"}},
["div", {"id": "5"}],
"xe"
)
assert len(xe) == 1
assert xe[0].params["id"] == "wanted xe"
Parameters: | absolute (bool, default None) – If true, first element will be searched from the root of the DOM. If None, _container attribute will be used to decide value of this argument. If False, find() call will be run first to find first element, then wfind() will be used to progress to next arguments. |
---|---|
Returns: | List of matching elements (blank if no matchin element found). |
Return type: | list |
True if element is listed in nonpair tag table (br for example) or if it ends with /> (<hr /> for example).
You can also change state from pair to nonpair if you use this as setter.
Parameters: | isnonpair (bool, default None) – If set, internal nonpair state is changed. |
---|---|
Returns: | True if tag is nonpair. |
Return type: | book |
Returns: | True if this is pair tag - <body> .. </body> for example. |
---|---|
Return type: | bool |
Detect whether this tag is opening or not.
Returns: | True if it is opening. |
---|---|
Return type: | bool |
Parameters: | opener (obj) – HTMLElement instance. |
---|---|
Returns: | True, if this element is endtag to opener. |
Return type: | bool |
Get HTML element representation of the tag, but only the tag, not the childs or endtag.
Returns: | HTML representation. |
---|---|
Return type: | str |
Returns almost original string (use original = True if you want exact copy).
If you want prettified string, try prettify(). :returns: Complete representation of the element with childs, endtag and so on.
Return type: | str |
---|
Returns: | Tag name or while element in case of normal text (not isTag()). |
---|---|
Return type: | str |
Returns: | Content of tag (everything between opener and endtag). |
---|---|
Return type: | str |
Same as toString(), but returns prettified element with content.
Note
This method is partially broken, and can sometimes create unexpected results.
Returns: | Prettified string. |
---|---|
Return type: | str |
Test whether this element contains at least all params, or more.
Parameters: | params (dict/SpecialDict) – Subset of parameters. |
---|---|
Returns: | True if all params are contained in this element. |
Return type: | bool |
Compare element with given tag_name, params and/or by lambda function fn.
Lambda function is same as in find().
Parameters: |
|
---|---|
Returns: | True if two elements are almost equal. |
Return type: | bool |
Replace value in this element with values from el.
This useful when you don’t want change all references to object.
Parameters: | el (obj) – HTMLElement instance. |
---|
Remove subelement (child) specified by reference.
Note
This can’t be used for removing subelements by value! If you want to do such thing, try:
for e in dom.find("value"):
dom.removeChild(e)
Parameters: |
|
---|