selkie.tree
— Trees
The Tree class
- class selkie.tree.Tree
The nodes of a tree are represented by instances of the class
Tree
. There is no separate node class: a node and the tree rooted at the node are both represented by aTree
instance.- word
A string or None.
- cat
Category.
- sem
Semantic translation or None.
- id
An identifier.
- parent
The parent node, if
set_parents()
was done.
- __init__(**kwargs)
Members can be specified as keywords.
- copy(**kwargs)
Members can be specified as keywords.
- __getitem__(i)
I-th node in preorder walk.
- __iter__()
Preorder walk.
- __str__()
Pretty-printed.
Basic node types
We wish to accommodate the nodes that occur in three kinds of trees:
unheaded phrase-structure trees, headed phrase-structure trees, and
dependency trees. The attributes word
, children
,
and role
are used to distinguish among node types.
Interior and leaf.
The children
attribute distinguishes between interior
nodes and leaf nodes. The former have children; the latter do not.
Governors and phrasal nodes are interior nodes. An interior node is
headed if one of its children has the role 'head'
, and it is
unheaded otherwise.
Lexical and nonlexical.
A node that has a (boolean True) value for word
is lexical;
otherwise the node is nonlexical.
Taking the cross product yields the following five node types:
Leaf |
Interior |
||
---|---|---|---|
Headed |
Unheaded |
||
Nonlexical: |
empty |
headed phrase |
unheaded phrase |
Lexical: |
word |
governor |
For a governor, a child with role 'head'
has no
special status.
Terminal and nonterminal. The interior-leaf distinction is not the same as the terminal-nonterminal distinction. The latter is a property of categories, as determined by a grammar. Terminal categories are not allowed to appear on the lefthand side of a rewrite rule, whereas nonterminal categories that are not useless appear on the lefthand side of at least one rewrite rule. It is possible to have leaf nodes with nonterminal categories: such nodes are null expansions. A tree generated by a constituent-structure grammar cannot have interior nodes labeled with terminal categories, but dependency grammars make no terminal-nonterminal distinction, and permit trees in which interior nodes are labeled with parts of speech, which would be terminal categories in a constituent-structure grammar.
Leaf words and empty leaves.
The word
attribute distinguishes between leaf words and empty leaves.
The former have a non-null value for word
, and the latter do not.
Note that the expression leaf word is not redundant in a
dependency tree: leaf words contrast with governors, which are
interior nodes with a value for word
. However, in a
constituency tree, only leaf nodes have values for word
, so in
that context we can refer to leaf words simply as words.
Depending on the kind of category it has,
an empty leaf may represent either a null terminal (like an
empty complementizer) or a null expansion (corresponding to a rewrite
rule with nothing on the right-hand side). We are careful not to
refer to null terminals as “words,” reserving the term word
for a node with a non-null value for word
.
Governor versus phrase.
The word
attribute also distinguishes between governors and phrases.
Both are interior nodes; the former has a non-null value for
word
, while the latter does not.
That is, a governor is a node that has non-null values for
both children
and word
. Governors are used in dependency
trees; their children are called dependents.
By contrast, a phrase or phrasal node has children but no word.
Phrasal nodes are used in constituency trees.
Heads.
We further subdivide phrasal nodes according to whether they have
heads or not. The head of a phrasal node is defined to be a
child whose role
is “head
.” A phrasal node with a head is a headed phrase,
and a phrasal node without a head is an unheaded phrase.
Governors and leaves are by definition headless.
Other attributes
Number of left dependents.
The nld
attribute is relevant only for governors.
It indicates the number of left dependents. In the terminal
string, the governor is ordered after its left dependents and before its right
dependents. If a phrasal node or leaf has a value for nld
, the
value is ignored.
Parent.
The parent
attribute is set by the function
set_parents()
. It permits one to navigate not only down a
tree, but also back up again.
Cat.
The cat
attribute represents the syntactic category of the tree.
The category may be anything, though strings and Category
instances are the commonest choices.
Role. The links in a dependency tree are often labeled. The link label indicates the relationship between governor and dependent, such as “subject” or “object.” The same relationship can be useful in constituent trees as indicating the role of a child relative to its parent (or the head of its parent).
As already mentioned, the role “head
” has a special status if
the parent is a phrasal node.
ID. Nodes are sometimes assigned identifiers, such as the indices used to encode movement relations or control.
Sem.
The value for sem
is the semantic translation of the node.
Example
Here is an example of constructing a tree manually, by constructing
individual nodes. The first two arguments to the constructor are the category and a
list of children. A word may be specified using the keyword “word
.”
>>> from selkie.tree import Tree
>>> det = Tree('Det', word='the')
>>> n = Tree('N', word='dog')
>>> np = Tree('NP', [det, n])
Here are examples for the three main attributes.
>>> np.cat
'NP'
>>> np.children
[<Tree Det the>, <Tree N dog>]
>>> np.word
>>> det.word
'the'
One can set role
and id
:
>>> det.role = 'spec'
>>> n.role = 'head'
>>> np.id = 1
The nld
attribute is only relevant for dependency trees: see
below under “Dependents.”
One can print out the tree rooted at a node using a print
statement:
>>> print(np)
0 (NP &1
1 (Det:spec the)
2 (N:head dog))
Notice that nodes are numbered. One can access them directly by number:
>>> np[2]
<Tree N dog>
This is particularly useful for large trees.
Copy
The method copy()
makes a shallow copy of a node. If the
original node has children, a fresh copy of the child list is made,
but the child nodes themselves are not copied.
>>> y = np.copy()
>>> y is np
False
>>> y.children is np.children
False
>>> y.children == np.children
True
>>> y.children[0] is np.children[0]
True
One can modify any of the attributes cat
, children
,
word
, nld
, role
, or id
when making the copy.
>>> z = np.copy(children=[])
>>> print(z)
0 (NP &1)
Node functions
The Tree
class has few methods. Instead, there is a large
collection of functions that are intended to work
with any object (though not all of them are fully general yet).
- getcat(n)
Category or None.
- getchildren(n)
List or
None
.
- getparent(n)
Node or
None
.
- getword(n)
String or
None
.
- getnld(n)
Integer or
None
.
- getrole(n)
Role or
None
.
- getid(n)
ID or
None
.
- getsem(n)
Semantic translation or
None
.
- is_interior(n)
Coerce children to boolean.
- is_leaf(n)
Non-false n with boolean false children.
- is_governor(n)
Non-false children, non-false word.
- is_phrase(n)
Non-false children, boolean false word.
- is_headed_phrase(n)
Phrase with head child.
- is_unheaded_phrase(n)
Phrase with no head child.
- is_leaf_word(n)
Leaf, non-false word.
- is_empty_leaf(n)
Leaf, boolean false word.
- is_unary(n)
Phrase, one child.
- nodetype(n)
String representing node type.
- head_child(n)
A node or
None
.
- head_index(n)
An integer or
None
.
- child_index(n, c)
Integer; c’s index in children.
- left_dependents(n)
List of nodes.
- right_dependents(n)
List of nodes.
- expansion(n)
List of nodes or
None
.
- delete_child(n, i)
Delete a child.
Accessors
Instead of using the attributes directly, it is best to use the
accessor functions getword()
, getchildren()
, getnld()
,
getrole()
, getcat()
, getsem()
, getid()
, and
getparent()
. These functions can be applied
to arbitrary objects, not just Tree
instances.
If called on something that
lacks the attribute in question, they return None
. There is one
exception: if a string is passed to getword()
, it returns the
string itself. (Hence a string behaves like a leaf node that has a
value for word
but has no category.)
Some examples:
>>> from selkie.tree import getcat, getword
>>> getcat(np)
'NP'
>>> getcat('hi')
>>> getword(det)
'the'
>>> getword('hi')
'hi'
Predicates
Basic predicates.
The following functions are available to test
for properties of a node:
is_interior()
,
is_leaf()
,
is_governor()
,
is_phrase()
,
is_headed_phrase()
,
is_unheaded_phrase()
,
is_leaf_word()
,
is_empty_leaf()
,
They have all been previously discussed.
Some examples:
>>> from selkie.tree import is_interior, is_leaf, is_headed_phrase
>>> is_interior('hi')
False
>>> is_leaf(det)
True
>>> is_leaf('hi')
True
>>> is_headed_phrase(np)
True
Is empty.
The function is_empty()
tests whether a node is empty or not.
This is technically not a property of the node itself, but of
the tree rooted at the node:
a node is empty just in case
neither it nor any of its descendants has a value for word
.
>>> from selkie.tree import is_empty
>>> is_empty(Tree())
True
>>> is_empty(Tree('NP', [Tree('N')]))
True
>>> is_empty(Tree('NP', [Tree('N', word='dog')]))
False
Is unary.
The function is_unary()
returns true just in case the node has
exactly one child.
>>> from selkie.tree import is_unary
>>> is_unary(np)
False
>>> is_unary(det)
False
>>> is_unary(Tree('NP', [Tree('N', word='rice')]))
True
Node type.
The function nodetype()
returns one of the
following: 'leaf'
, 'governor'
, 'unheaded phrase'
,
or 'headed phrase'
.
>>> from selkie.tree import nodetype
>>> nodetype(np)
'headed phrase'
>>> nodetype(det)
'leaf'
>>> nodetype('hi')
'leaf'
Structural access
Head child. The function head_child()
returns the
child whose role is “head
,” if one exists. (If there is more
than one, it returns only the first.)
>>> from selkie.tree import head_child
>>> head_child(np)
<Tree N dog>
Head index. The function head_index()
returns the
head child’s index in the children
list. It returns -1 if
there is no head child. Children are numbered from 0.
>>> from selkie.tree import head_index
>>> head_index(np)
1
>>> head_index('hi')
-1
Child index. The function child_index()
takes two
arguments, parent and child, and returns the index of the child in the
parent’s children
list. It returns -1 if the child is not found.
>>> from selkie.tree import child_index
>>> child_index(np, det)
0
>>> child_index(np, 'foo')
-1
Dependents. If the node has a value for nld
, the
function left_dependents()
returns all children up to, but not
including, nld
. The function right_dependents()
returns all remaining
children. If the node has no value for nld
, but it does have a
head child, then left_dependents()
returns all children
preceding the head child, and right_dependents()
returns all children
following the head child. If the node has neither nld
nor a
head child, both functions signal an error.
>>> from selkie.tree import left_dependents, right_dependents
>>> left_dependents(np)
[<Tree Det the>]
>>> right_dependents(np)
[]
>>> sbj = Tree('N', word='dogs')
>>> v = Tree('V', word='chase')
>>> obj = Tree('N', word='cats')
>>> v.children = [sbj, obj]
>>> v.nld = 1
>>> left_dependents(v)
[<Tree N dogs>]
>>> right_dependents(v)
[<Tree N cats>]
Expansion. If a node has children, the function
expansion()
returns a tuple consisting of the node’s category
followed by the categories of its children. Some of the categories
may be None
. If the node has no children, the return value is
None
.
>>> from selkie.tree import expansion
>>> expansion(np)
('NP', 'Det', 'N')
Destructive
The function delete_child()
takes a node and a child index, and
deletes the child at that index. The value for nld
is adjusted,
if necessary. There is no return value.
>>> from selkie.tree import delete_child
>>> delete_child(v,0)
>>> left_dependents(v)
[]
>>> right_dependents(v)
[<Tree N cats>]
Trees
The functions that treat a Tree
instance as representing a
complete tree (rather than just a node) are summarized as
follows.
- is_headed_tree(t)
All interior nodes have heads
- is_unheaded_tree(t)
All interior nodes lack heads
- is_dependency_tree(t)
All interior nodes have words.
- treetype(n)
A string representing the type.
- load_trees(fn)
Returns a list of trees.
- parse_trees(s)
Same, but s is string.
- parse_tree(s)
Single tree, else error.
- iter_trees(fn)
Iteration over trees.
- tree_string(t)
Pretty-printed.
- print_tree(t)
Prints tree string.
- save_trees(t, fn)
Saves tree string to file.
- load_tabular_trees(fn)
List of trees.
- iter_tabular_trees(fn)
Iteration over trees.
- save_tabular_trees(ts, fn)
Write to file.
Tree types
The type of a tree is defined by the type of interior nodes it contains.
A tree is an unheaded phrase-structure tree if all its interior nodes are unheaded phrasal nodes. A tree is a headed phrase-structure tree if all its interior nodes are headed phrasal nodes. A tree is a dependency tree if all its interior nodes are governors.
A hybrid tree is one that satisfies none of these three definitions.
All three types of tree contain identical leaf nodes. They differ only in their interior nodes. Technically, a tree containing no interior nodes (i.e., consisting of a single terminal node) satisfies all three definitions.
The following functions test tree types:
>>> from selkie.tree import is_headed_tree, is_unheaded_tree, is_dependency_tree
>>> is_headed_tree(np)
True
>>> is_unheaded_tree(np)
False
>>> is_dependency_tree(v)
True
The function treetype()
returns the tree type:
'headed phrase'
, 'unheaded phrase'
, or 'governor'
(the lattermost for a dependency tree). It returns 'leaf'
if
the tree consists of a single leaf node, and None
if the tree is
hybrid.
>>> from selkie.tree import treetype
>>> treetype(np)
'headed phrase'
>>> treetype(v)
'governor'
>>> treetype(det)
'leaf'
Load and parse
Iter trees.
The function iter_trees()
reads trees in a lisp-like format from a
file or string. Like all the load/save functions,
iter_trees()
takes its argument to name a file if it is a Fn
, and to
provide the contents, if it is a string. Here is an example:
>>> from selkie.data import ex
>>> from selkie.tree import iter_trees
>>> ts = iter_trees(ex('tree2'))
>>> next(ts)
<Tree S ...>
>>> print(_)
0 (S
1 (NP:subj &1
foo
2 (Det the)
3 (N dog))
4 (VP:head &2
5 (V chased)
6 (NP:dobj
7 (Det the)
8 (N cat))))
Load and parse.
The function load_trees()
simply returns:
list(iter_trees(fn))
The function parse_trees()
also dispatches to iter_trees()
,
but it wraps its arguments in a Contents
instance (from selkie.io
) so that the argument is interpreted as a
string representing a tree, rather than a filename.
The function parse_tree()
returns a single tree instead of a
list of trees; it signals an error if its argument does not parse as a
single tree.
>>> from selkie.tree import parse_tree
>>> foo = parse_tree('''
... (NP:subj&1 foo
... (Det the)
... (N:head dog))
... ''')
>>> print(foo)
0 (NP:subj &1
foo
1 (Det the)
2 (N:head dog))
Print and save
Pretty-print string.
The function tree_string()
takes a tree and returns a
pretty-printed string.
>>> from selkie.tree import tree_string
>>> tree_string(foo)
'0 (NP:subj &1\n foo\n1 (Det the)\n2 (N:head dog))'
>>> print(_)
0 (NP:subj &1
foo
1 (Det the)
2 (N:head dog))
The Tree
method __str__()
simply dispatches to
tree_string()
.
One can suppress the node numbers by specifying numerate=False
.
>>> print(tree_string(foo, numerate=False))
(NP:subj &1
foo
(Det the)
(N:head dog))
The string-valued attributes (word
, cat
,
role
, id
) are formatted without surrounding quotes unless
they contain whitespace or one of the following special characters: left or right
parenthesis, left or right square bracket, colon, or single or double quote.
For example:
>>> print(Tree('Multi\x20N', id='a/b', word='hors d\'oeuvre'))
0 ('Multi N' "hors d'oeuvre" &a/b)
Save trees.
The function save_trees()
takes an iterator over trees and a
filename, and prints the trees to the named file.
>>> from tempfile import TemporaryDirectory
>>> from os.path import join
>>> from selkie.io import load_string
>>> from selkie.tree import save_trees
>>> with TemporaryDirectory() as dirname:
... fn = join(dirname, 'foo')
... save_trees([foo], fn)
... print(load_string(fn), end='')
...
(NP:subj &1
foo
(Det the)
(N:head dog))
As with all of the “save” functions, the filename is option; if omitted, it collects and returns its output as a string.
>>> s = save_trees([foo])
>>> print(s, end='')
(NP:subj &1
foo
(Det the)
(N:head dog))
Tabular tree files
There is also a tabular format for representing trees in a file. An
example is provided by the file t1
, whose contents are:
>>> from selkie.io import contents
>>> print(contents(ex('t1')), end='')
[ S
[ NP
+ Det the
+ N cat
]
[ VP
+ V chased
[ NP
+ Det the
+ N dog
]
]
]
A record may contain up to six fields:
1 |
Record type |
Left bracket for the beginning of a nonterminal node, right bracket for the end of a nonterminal node, and plus for a terminal node. |
2 |
Category |
Syntactic category. |
3 |
Word |
It may not contain a tab or newline, but any other character (including space) is allowed. |
4 |
Role |
A symbol representing the relation between the node and its parent or governor. |
5 |
Head |
A numeric index, identifying either a particular child, or a position among the children. |
6 |
ID |
A numeric index for the node. |
None of the fields is obligatory. Additional fields beyond these six are also permitted, but ignored.
The function iter_tabular_trees()
can be used to read a file
in tabular tree format. It is a generator over trees:
>>> from selkie.tree import iter_tabular_trees
>>> t1 = next(iter_tabular_trees(ex('t1')))
>>> print(t1)
0 (S
1 (NP
2 (Det the)
3 (N cat))
4 (VP
5 (V chased)
6 (NP
7 (Det the)
8 (N dog))))
The function load_tabular_trees()
converts the generator into a list.
Conversely, the function save_tabular_trees()
takes a tree iterator and a filename, and saves the trees in tabular format.
>>> from selkie.tree import save_tabular_trees
>>> with TemporaryDirectory() as dirname:
... fn = join(dirname, 'foo')
... save_tabular_trees([foo], fn)
... for line in open(fn): print(line, end='')
...
[ NP foo subj 0 1
+ Det the
+ N dog head
]
Drawing
The function draw_tree()
draws a tree. It requires the package
“graphviz” to be installed. It takes a second argument, which is
the filename to write. If the filename is omitted, a temp file is
written and displayed to the screen.
Tree iterations
If one iterates directly over a tree, one iterates over its nodes in preorder. The iteration functions and related functions are summarized as follows.
- preorder(n)
An iteration over nodes.
- textorder(n)
An iteration over nodes.
- iter_nodes(t)
A preorder walk.
- nodes(t)
A list, in preorder.
- iter_edges(t)
Parents in preorder, children in order.
- edges(t)
A list.
- subtrees(t, f)
Highest nodes that satisfy f (list).
- iter_subtrees(t, f)
Iteration.
- subtree(t, f)
Error if not unique.
- paths(t)
Paths in tree as slash-joined cats.
- leaves(t)
All leaves.
- words(t)
Non-empty leaves.
- tagged_words(t)
List of (word, cat) pairs.
- terminal_string(t)
Space-joined string.
- is_empty(t)
Yields empty string.
- is_efree_tree(t)
All nodes have children or word.
- is_unaryfree_tree(t)
No unary expansions.
- copy_tree(t)
Deep copy.
- delete_nodes(t, cs)
Delete nodes with cat in cs.
- eliminate_epsilons(n)
Elminate empty nodes, destructive.
- set_parents(t)
Detructive.
- getroot(n)
Root above node.
Preorder and text order walks
A walk is an iteration over the nodes of a tree. Two different
walks are defined: preorder()
and textorder()
. In a preorder walk, one
visits a node before any of its children. For phrase-structure trees,
text order is identical to preorder, but for dependency trees, they
differ. More precisely, in a text-order walk, any node that has a value for nld
is visited after visiting its left dependents, but before visiting
its right dependents.
To illustrate, we create two trees. The first is a headed phrasal tree:
>>> from selkie.tree import parse_tree
>>> h = parse_tree('''(S (NP (Det the) (N:head dog))
... (VP:head (V:head barked))
... (Adv loudly))''')
...
The second is a dependency tree:
>>> d = parse_tree('(V (N (Det the) dog) barked (Adv loudly))')
We can confirm that the preorder and text order walks for the phrasal tree are the same:
>>> from selkie.tree import preorder, textorder
>>> for node in preorder(h): print(repr(node))
...
<Tree S ...>
<Tree NP ...>
<Tree Det the>
<Tree N dog>
<Tree VP ...>
<Tree V barked>
<Tree Adv loudly>
>>> for node in textorder(h): print(repr(node))
...
<Tree S ...>
<Tree NP ...>
<Tree Det the>
<Tree N dog>
<Tree VP ...>
<Tree V barked>
<Tree Adv loudly>
But they differ for the dependency tree:
>>> for node in preorder(d): print(repr(node))
...
<Tree V barked ...>
<Tree N dog ...>
<Tree Det the>
<Tree Adv loudly>
>>> for node in textorder(d): print(repr(node))
...
<Tree Det the>
<Tree N dog ...>
<Tree V barked ...>
<Tree Adv loudly>
Nodes and edges
The __iter__()
method of a tree, and
the function iter_nodes()
, are both synonyms for preorder()
.
The function nodes()
turns the generator into a list.
An edge is a pair (p,c) where p is a parent node and c
is one of its children. The function iter_edges()
returns an
iteration over the edges in a tree. The function edges()
turns
the iteration into a list.
Subtrees
Subtrees, iter subtrees.
The function iter_subtrees()
returns an iteration over subtrees
that satisfy a given predicate. The function subtrees()
turns
the iteration into a list.
The difference between these functions and
simply filtering the output of nodes()
is that iter_subtrees()
terminates the recursion whenever it finds a node matching the
predicate.
>>> from selkie.tree import subtrees
>>> subtrees(h, lambda x: is_leaf(head_child(x)))
[<Tree NP ...>, <Tree VP ...>]
If the second argument is a string, it is taken to be the desired node category.
>>> subtrees(h, 'Adv')
[<Tree Adv loudly>]
Subtree.
The function subtree()
takes the list produced by subtrees()
and
returns its member, if there is exactly one. It signals an error if
the list is not a singleton list.
Paths and leaves
Paths.
The function paths()
returns the list of paths through the
tree. A path is represented by a string in which node categories are
separted by “/.” The categories in the path are ordered from root
to leaf.
>>> from selkie.tree import paths
>>> paths(h)
['S/NP/Det', 'S/NP/N', 'S/VP/V', 'S/Adv']
Leaves.
The function leaves()
returns the list of leaf nodes in a tree.
The leaves are listed in preorder.
Words.
The function words()
differs from leaves()
in two ways: it
only includes leaves that have a value for word
, and it uses a
text-order walk.
>>> from selkie.tree import words
>>> words(h)
['the', 'dog', 'barked', 'loudly']
Tagged words.
The function tagged_words()
is like words()
, except that
it produces a list of pairs of form (word, cat).
>>> from selkie.tree import tagged_words
>>> tagged_words(h)
[('the', 'Det'), ('dog', 'N'), ('barked', 'V'), ('loudly', 'Adv')]
Terminal string.
The function terminal_string()
takes the output of
words()
and turns it into a string. The words are separated by
spaces.
>>> from selkie.tree import terminal_string
>>> terminal_string(h)
'the dog barked loudly'
Predicates
Is e-free.
The function is_efree_tree()
returns true just in case the tree
contains no empty nodes.
Is unary-free.
The function is_unaryfree_tree()
returns true just in case
there are no unary-branching nodes in the tree.
Copy tree
The function copy_tree()
does a deep copy of a tree. Unlike
the node method copy()
, copy_tree()
does recurse through
the whole tree, making copies of all nodes.
Transformations
The operations described in this section, as well as the transformations described in the chapters on head-marking and stemma conversion, are destructive. To protect a tree, make a copy before applying destructive operations.
Delete nodes.
The function delete_nodes()
deletes all nodes with a given
category. (However, it never deletes the root node.)
Eliminate epsilons.
The function eliminate_epsilons()
eliminates all
empty nodes from a tree. If the tree was initially headed, any
heads that are empty will get deleted.
>>> from selkie.tree import eliminate_epsilons
>>> e = parse_tree('''
... (S
... (NP (N ))
... (VP
... (VBZ )
... (RB surely)
... (NP Fido)))
... ''')
>>> eliminate_epsilons(e)
>>> print(e)
0 (S
1 (VP
2 (RB surely)
3 (NP Fido)))
Set parents, get root.
The function set_parents()
destructively adds a parent
attribute to every node in the tree, pointing back to the node’s
parent.
After parents have been set in a tree, one can use the
function getroot()
to go from any node to the root node.
It follows parent links up the tree to the root node.
Tree builder
- class selkie.tree.TreeBuilder
A stack-like data structure for constructing a tree.
- start(c, w[, r, i])
Start a new interior node. Cat, word, role, id.
- middle()
This is the head position.
- end()
Pop node.
- leaf(c, w[, r, i])
Create a leaf node.
- trees()
A list; error if not finished.
- tree()
A tree; error if not finished and unique.
Here is an example of use:
>>> from selkie.tree import TreeBuilder
>>> tb = TreeBuilder()
>>> tb.start('NP')
<Tree NP>
>>> tb = TreeBuilder()
>>> tb.start('S')
<Tree S>
>>> tb.start('NP', role='subj')
<Tree NP>
>>> tb.leaf('Det', 'the')
<Tree Det the>
>>> tb.leaf('N', 'dog')
<Tree N dog>
>>> tb.end()
<Tree NP ...>
>>> tb.start('VP', role='head')
<Tree VP>
>>> tb.leaf('V', 'barks', role='head')
<Tree V barks>
>>> tb.end()
<Tree VP ...>
>>> tb.end()
<Tree S ...>
>>> tb.tree()
<Tree S ...>
>>> print(_)
0 (S
1 (NP:subj
2 (Det the)
3 (N dog))
4 (VP:head
5 (V:head barks)))
The methods for building a phrasal node are start()
and
end()
. Both return the node.
To build a dependency node, one also uses the method middle()
to
mark the position at which the governor occurs. For example:
>>> tb.start('V', word='chase')
<Tree V chase>
>>> tb.leaf('N', 'dogs')
<Tree N dogs>
>>> tb.middle()
>>> tb.leaf('N', 'cats')
<Tree N cats>
>>> tb.end()
<Tree V chase ...>
>>> tb.tree()
<Tree V chase ...>
>>> print(_)
0 (V
1 (N dogs)
chase
2 (N cats))
The builder allows one to construct multiple trees; it saves them on a
list until one calls either tree()
or trees()
. The latter
returns the list of trees constructed. The former returns a single
tree, and signals an error if there is not exactly one tree on the
list. Both methods signal an error if there is an incomplete tree in
progress. Both methods restore the builder to its empty state.