selkie.seq — Sequences and iterators

List-producing functions

The functions as_list(), repeatable(), concat(), unique(), and cross_product() produce lists as output.

As list. The function as_list() converts any item x to a list.

If x is None, it returns the empty list. If x is a list, it returns x itself. If x is a sequence, it returns list(*x*). If x is has attribute “next,” it takes x to be a generator and returns list(*x*). Otherwise, it returns [*x*].

Repeatable. A generator can only be used once, whereas iterables such as lists, tuples, sets, and dicts can be iterated over multiple times. The function repeatable() converts a generator into a list, but leaves other iterables alone. It coerces None to the empty list, but otherwise signals an error if its input is not an iterable. It assumes that any object with a next attribute is a generator, and any object with an __iter__ attribute is an iterable.

Unique. The function unique() takes a list as input and produces a list with all duplicates removed. The list does not need to be sorted, nor do duplicates need to be adjacent to each other. The algorithm is naive (quadratic), so it is only appropriate for short lists.

>>> from selkie.seq import unique
>>> unique([4, 2, 4, 1, 2])
[4, 2, 1]

Cross product. The function cross_product() takes a single argument, a list of lists, and produces the cross product of those lists as output.

>>> from selkie.seq import cross_product
>>> cross_product([['a', 'b'], [1, 2], [42]])
[('a', 1, 42), ('a', 2, 42), ('b', 1, 42), ('b', 2, 42)]

Sorted lists

The functions intersect, union, and difference expect sorted lists as input. Their behavior is unpredictable if they are given unsorted lists.

>>> from selkie.seq import intersect, union, difference
>>> x = [1,3,5,6,7]
>>> y = [2,3,4,7,8]
>>> intersect(x,y)
[3, 7]
>>> union(x,y)
[1, 2, 3, 4, 5, 6, 7, 8]
>>> difference(x,y)
[1, 5, 6]
>>> difference(y,x)
[2, 4, 8]

Queue

A Queue is a first-in first-out queue. The method write() inserts an element at the tail of the queue, and read() removes and returns the element at the head of the queue.

It is implemented as a buffer with head and tail pointers. Initially the buffer is empty. If the tail is at the end of the buffer, new elements are appended to the buffer and the buffer grows. When the queue is empty, the head and tail are reset to 0.

Space in the buffer before the head is “wasted” space. If the wasted space exceeds a threshold (maxwaste), the contents of the queue are relocated so that the head is 0. One can specify maxwaste when creating the queue; by default it is 10. Setting maxwaste to None prevents the contents from being relocated (though the head and tail will still be reset to 0 if the queue becomes empty).

The elements in the queue can be accessed and set by index.

Edit distance

Edit distance works with sequences generally, not just strings. To create and use an edit distance function:

>>> from selkie.seq import EditDistance
>>> distance = EditDistance()
>>> distance('testy', 'tezt')
2

By default, it uses the function simple_distance(), which imposes a cost of 1 for each insertion, substitution, or deletion. One can provide a new cost function when instantiating EditDistance. A cost function should take two arguments, x and y, representing a deletion if y is None, and insertion if x is None, and a substitution, if neither is None. The return value should be a number, possibly math.inf.

Generators

The following functions are provided that relate to generators: chain(), nth(), head(), tail(), more(), product(), count(), and counts().

For the purpose of illustration, let us define a little generator:

>>> def pots ():
...     for i in range(11):
...         yield 2**i
...
>>> list(pots())
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

Chain. The function chain() is imported from itertools. It concatenates multiple generators.

Nth. The function nth() returns a particular item from an iterator.

>>> from selkie.seq import nth
>>> nth(pots(), 2)
4

Remember that an iterable is consumed as one iterates through it. In the example just given, we created a new generator by calling pots(). If we use its value, though, we need to be careful:

>>> iter = pots()
>>> nth(iter, 2)
4
>>> list(iter)
[8, 16, 32, 64, 128, 256, 512, 1024]

Note that nth() consumed the first three items. One use of nth is to jump to problematic cases in a large iteration. An idiom for finding such cases in the first place is the following:

for i, x in enumerate(myiteration):
    if isproblematic(x):
        return i

Head, tail. The functions head() and tail() are also provided for inspecting parts of a large iterable.

>>> from selkie.seq import head, tail
>>> head(pots())
[1, 2, 4, 8, 16]
>>> tail(pots())
[64, 128, 256, 512, 1024]

An optional argument specifies how many items one would like to have:

>>> head(pots(), 3)
[1, 2, 4]
>>> tail(pots(), 3)
[256, 512, 1024]

A more general function is islice, from the standard itertools module.

>>> from itertools import islice
>>> list(islice(pots(), 2, 5))
[4, 8, 16]

More. The function more() calls print on each item in turn, pausing after a pageful of items has been displayed. Hitting return causes another page to be displayed, and hitting ‘q’ then enter causes more() to quit.

One can adjust the pagesize by setting more.pagesize. For example:

>>> more.pagesize = 4
>>> more(pots())
1
2
4
8
q

Product. The function product() is analogous to sum(). It takes an iterable containing numbers, and returns the product of the numbers.

Count. The function count() is analogous to len(), except that it works for generators as well as lists and other iterables.

>>> from selkie.seq import count
>>> count(pots())
11

Note that count is unrelated to itertools.count(). The latter returns an infinite iterator that generates the natural numbers.

Counts. The function counts() creates a table of counts of occurrences.

>>> from selkie.seq import counts
>>> tab = counts('abracadabra')
>>> sorted(tab.items())
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]