Data Model

This section documents some of DistKV’s server-internal classes.

This module contains DistKV’s basic data model.

TODO: message chains should be refactored to arrays: much lower overhead.

class distkv.model.Node(name, tick=None, cache=None, create=True)

Represents one DistKV participant.

for ... in enumerate(n: int = 0, current: bool = False)

Return a list of valid keys for that node.

Used to find data from no-longer-used nodes so they can be deleted.

seen(tick, entry=None, local=False)

An event with this tick was in the entry’s chain.

Parameters
  • tick – The event affecting the given entry.

  • entry – The entry affected by this event.

  • local – The message was not broadcast, thus do not assume that other nodes saw this.

is_deleted(tick)

Check whether this tick has been marked as deleted.

mark_deleted(tick)

The data for this tick will be deleted.

Parameters

tick – The event that caused the deletion.

Returns: the entry, if still present

clear_deleted(tick)

The data for this tick are definitely gone (deleted).

purge_deleted(r: range_set.RangeSet)

All entries in this rangeset are deleted.

This is a shortcut for calling clear_deleted() on each item.

supersede(tick)

The event with this tick is no longer in the referred entry’s chain. This happens when an entry is updated.

Parameters

tick – The event that once affected the given entry.

report_superseded(r: range_set.RangeSet, local=False)

Some node said that these entries may have been superseded.

Parameters
  • range – The RangeSet thus marked.

  • local – The message was not broadcast, thus do not assume that other nodes saw this.

report_missing(r: range_set.RangeSet)

Some node doesn’t know about these ticks.

We may need to broadcast either their content, or the fact that these ticks have been superseded.

report_deleted(r: range_set.RangeSet, server)

This range has been reported as deleted.

Parameters
  • range (RangeSet) – the range that’s gone.

  • add (dict) – store additional vanished items. Nodename -> RangeSet

local_present

Values I know about

local_superseded

Values I knew about

local_deleted

Values I know to have vanished

local_missing

Values I have not seen, the inverse of local_present() plus local_superseded()

remote_missing

Values from this node which somebody else has not seen

kill_this_node(cache=None)

Remove this node from the system. No chain’s first link may point to this node.

class distkv.model.NodeSet(encoded=None, cache=None)

Represents a dict (nodename > RangeSet).

class distkv.model.NodeEvent(node: distkv.model.Node, tick: Optional[int] = None, prev: Optional[distkv.model.NodeEvent] = None)

Represents any event originating at a node.

Parameters
  • node – The node thus affected

  • tick – Counter, timestamp, whatever

  • prev – The previous event, if any

equals(other)

Check whether these chains are equal. Used for ping comparisons.

The last two items may be missing from either chain.

find(node)

Return the position of a node in this chain. Zero if the first entry matches.

Returns None if not present.

filter(node, server=None)

Return an event chain without the given node.

If the node is not in the chain, the result is not a copy.

attach(prev: Optional[distkv.model.NodeEvent] = None, server=None)

Copy this node, if necessary, and attach a filtered prev chain to it

class distkv.model.UpdateEvent(event: distkv.model.NodeEvent, entry: distkv.model.Entry, new_value, old_value=<class 'distkv.util._impl.NotGiven'>, tock=None)

Represents an event which updates something.

class distkv.model.Entry(name: str, parent: distkv.model.Entry, tock=None)

This class represents one key/value pair

SUBTYPE

alias of distkv.model.Entry

follow_acl(path, *, create=True, nulls_ok=False, acl=None, acl_key=None)

Follow this path.

If create is True (default), unknown nodes are silently created. Otherwise they cause a KeyError. If None, assume create=True but only check the ACLs.

If nulls_ok is False (default), None is not allowed as a path element. If 2, it is allowed anywhere; if True, only as the first element.

If acl is not None, then acl_key is the ACL letter to check for. acl must be an ACLFinder created from the root of the ACL in question.

The ACL key ‘W’ is special: it checks ‘c’ if the node is new, else ‘w’.

Returns a (node, acl) tuple.

follow(path, *, create=True, nulls_ok=False)

As follow_acl(), but isn’t interested in ACLs and only returns the node.

mark_deleted(server)

This entry has been deleted.

Returns

the entry’s chain.

purge_deleted()

Call Node.clear_deleted() on each link in this entry’s chain.

await set_data(event: distkv.model.NodeEvent, data: Any, server=None, tock=None)

This entry is updated by that event.

Parameters
Returns

The UpdateEvent that has been generated and applied.

await apply(evt: distkv.model.UpdateEvent, server=None, root=None, loading=False)

Apply this :cls`UpdateEvent` to me.

Also, forward to watchers.

await walk(proc, acl=None, max_depth=- 1, min_depth=0, _depth=0, full=False)

Call coroutine proc on this node and all its children).

If acl (must be an ACLStepper) is given, proc is called with the acl as second argument.

If proc raises StopAsyncIteration, chop this subtree.

serialize(chop_path=0, nchain=2, conv=None)

Serialize this entry for msgpack.

Parameters
  • chop_path – If <0, do not return the entry’s path. Otherwise, do, but remove the first N entries.

  • nchain – how many change events to include.

await updated(event: distkv.model.UpdateEvent)

Send an event to this node (and all its parents)’s watchers.

class distkv.model.Watcher(root: distkv.model.Entry, full: bool = False, q_len: Optional[int] = None)

This helper class is used as an async context manager plus async iterator. It reports all updates to an entry (or its children).

If a watcher terminates, sending to its channel has blocked. The receiver needs to take appropriate re-syncing action.

ACLs

ACL checks are performed by ACLFinder. This class collects all relevant ACL entries for any given (sub)path, sorted by depth-first specificty. This basically means that you collect all ACLs that could possibly match a path and sort them; the + and # wildcards get sorted last. Then the system picks the first entry that actually has a value.

This basically means that if you have a path a b c d e f g and ACLs a b # g and a # d e f g, the first ACL will match because b is more specific than #, even though the second ACL is longer and thus could be regarded as being more specific. However, the current rule is more stable when used with complex ACLs and thus more secure.

class distkv.types.ACLFinder(acl, blocked=None)

A NodeFinder which expects ACL strings as elements

Helper methods and classes

class distkv.util.MsgWriter(*a, buflen=65536, **kw)

Write a stream of messages to a file (encoded with MsgPack).

Usage:

async with MsgWriter("/tmp/msgs.pack") as f:
    for msg in some_source_of_messages():  # or "async for"
        await f(msg)
Parameters
  • buflen (int) – The buffer size. Defaults to 64k.

  • path (str) – the file to write to.

  • stream – the stream to write to.

Exactly one of path and stream must be used.

The stream is buffered. Call flush() to flush the buffer.

await flush()

Flush the buffer.

distkv.util.NotGiven

This object marks the absence of information where simply not using the data element or keyword at all would be inconvenient.

For instance, in def fn(value=NotGiven, **kw) you’d need to test 'value'  in kw, or use an exception. The problem is that this would not show up in the function’s signature.

With NotGiven you can simply test value is (or is not) NotGiven.

This module’s job is to run code, resp. to keep it running.

exception distkv.runner.NotSelected

This node has not been selected for a very long time. Something is amiss.

class distkv.runner.RunnerMsg(msg=None)

Superclass for runner-generated messages.

Not directly instantiated.

This message and its descendants take one opaque parameter: msg.

class distkv.runner.ChangeMsg(msg=None)

A message telling your code that some entry has been updated.

Subclass this and use it as CallAdmin.watch’s cls parameter for easier disambiguation.

The runner sets path and value attributes.

class distkv.runner.MQTTmsg(msg=None)

A message transporting some MQTT data.

value is the MsgPack-decoded content. If that doesn’t exist the message is not decodeable.

The runner also sets the path attribute.

class distkv.runner.ReadyMsg(msg=None)

This message is queued when the last watcher has read all data.

class distkv.runner.TimerMsg(msg=None)

A message telling your code that a timer triggers.

Subclass this and use it as CallAdmin.timer’s cls parameter for easier disambiguation.

class distkv.runner.CallAdmin(runner, state, data)

This class collects some standard tasks which async DistKV-embedded code might want to do.

await cancel()

Cancel the running task

await spawn(proc, *a, **kw)

Start a background subtask.

The task is auto-cancelled when your code ends.

Returns: an anyio.abc.CancelScope which you can use to cancel the

subtask.

await setup_done(**kw)

Call this when your code has successfully started up.

await error(path=None, **kw)

Record that an error has occurred. This function records specific error data, then raises ErrorRecorded which the code is not supposed to catch.

See distkv.errors.ErrorRoot.record_error for keyword details. The path argument is auto-filled to point to the current task.

await watch(path, cls=<class 'distkv.runner.ChangeMsg'>, **kw)

Create a watcher. This path is monitored as per distkv.client.Client.watch; messages are encapsulated in ChangeMsg objects. A ReadyMsg will be sent when all watchers have transmitted their initial state.

By default a watcher will only monitor a single entry. Set max_depth if you also want child entries.

By default a watcher will not report existing entries. Set fetch=False if you want them.

await send(path, value=<class 'distkv.util._impl.NotGiven'>, raw=None)

Publish an MQTT message.

Set either value or raw.

await set(path, value, chain=<class 'distkv.util._impl.NotGiven'>)

Set a DistKV value.

await get(path, value)

Get a DistKV value.

await monitor(path, cls=<class 'distkv.runner.MQTTmsg'>, **kw)

Create an MQTT monitor. Messages are encapsulated in MQTTmsg objects.

By default a monitor will only monitor a single entry. You may use MQTT wildcards.

The message is decoded and stored in the value attribute unless it’s either undecodeable or raw is set, in which case it’s stored in .msg. The topic the message was sent to is in topic.

class distkv.runner.RunnerEntry(*a, **k)

An entry representing some hopefully-running code.

The code will run some time after target has passed. On success, it will run again repeat seconds later (if >0). On error, it will run delay seconds later (if >0), multiplied by 2**backoff.

Parameters
  • code (list) – pointer to the code that’s to be started.

  • data (dict) – additional data for the code.

  • delay (float) – time before restarting the job on error. Default 100.

  • repeat (float) – time before restarting on success. Default: zero: no restart.

  • target (float) – time the job should be started at. Default: zero: don’t start.

  • ok_after (float) – the job is marked OK if it has run this long. Default: zero: the code will do that itself.

  • backoff (float) – Exponential back-off factor on errors. Default: 1.1.

The code runs with these additional keywords:

_self: the `CallEnv` object, which the task can use to actually do things.
_client: the DistKV client connection.
_info: a queue which the task can use to receive events. A message of
    ``None`` signals that the queue was overflowing and no further
    messages will be delivered. Your task should use that as its
    mainloop.
_P: build a path from a string
_Path: build a path from its arguments

Some possible messages are defined in distkv.actor.

await send_event(evt)

Send an event to the running process.

await set_value(value)

Process incoming value changes

await run_at(t: float)

Next run at this time.

should_start()

Tell whether this job might want to be started.

Returns

No, it’s running (or has run and doesn’t restart). 0: No, it should not start >0: timestamp at which it should start, or should have started

Return type

False

class distkv.runner.RunnerNode(root, name)

Represents all nodes in this runner group.

This is used for load balancing and such. TODO.

class distkv.runner.StateEntry(parent, name=None)

This is the actual state associated with a RunnerEntry. It must only be managed by the node that actually runs the code.

Parameters
  • started (float) – timestamp when the job was last started

  • stopped (float) – timestamp when the job last terminated

  • pinged (float) – timestamp when the state was last verified by the runner

  • result (Any) – the code’s return value

  • node (str) – the node running this code

  • backoff (float) – on error, the multiplier to apply to the restart timeout

  • computed (float) – computed start time

  • reason (str) – reason why (not) starting

result

alias of distkv.util._impl.NotGiven

class distkv.runner.StateRoot(client, path, *, need_wait=False, cfg=None, require_client=True)

Base class for handling the state of entries.

This is separate from the RunnerRoot hierarchy because the latter may be changed by anybody while this subtree may only be affected by the actual runner. Otherwise we get interesting race conditions.

await kill_stale_nodes(names)

States with node names in the “names” set are stale. Kill them.

class distkv.runner.AnyRunnerRoot(*a, **kw)

This class represents the root of a code runner. Its job is to start (and periodically restart, if required) the entry points stored under it.

AnyRunnerRoot tries to ensure that the code in question runs on one single cluster member. In case of a network split, the code will run once in each split areas until the split is healed.

max_age

Timeout after which we really should have gotten another go

await find_stale_nodes(cur)

Find stale nodes (i.e. last seen < cur) and clean them.

class distkv.runner.SingleRunnerRoot(*a, **kw)

This class represents the root of a code runner. Its job is to start (and periodically restart, if required) the entry points stored under it.

While AnyRunnerRoot tries to ensure that the code in question runs on any cluster member, this class runs tasks on a single node. The code is able to check whether any and/or all of the cluster’s main nodes are reachable; this way, the code can default to local operation if connectivity is lost.

Local data (dict):

Parameters

cores (tuple) – list of nodes whose reachability may determine whether the code uses local/emergency/??? mode.

Config file:

Parameters
  • path (tuple) – the location this entry is stored at. Defaults to ('.distkv', 'process').

  • name (str) – this runner’s name. Defaults to the client’s name plus the name stored in the root node, if any.

  • actor (dict) – the configuration for the underlying actor. See asyncactor for details.

max_age

Timeout after which we really should have gotten another ping

class distkv.runner.AllRunnerRoot(*a, **kw)

This class represents the root of a code runner. Its job is to start (and periodically restart, if required) the entry points stored under it.

This class behaves like SingleRunner, except that it runs tasks on all nodes.

This module implements a asyncactor.Actor which works on top of a DistKV client.

class distkv.actor.ActorState(msg=None)

base class for states

class distkv.actor.BrokenState(msg=None)

I have no idea what’s happening, probably nothing good

class distkv.actor.DetachedState(msg=None)

I am detached, my actor group is not visible

class distkv.actor.PartialState(msg=None)

Some but not all members of my actor group are visible

class distkv.actor.CompleteState(msg=None)

All members of my actor group are visible