The flow components of python-weka-wrapper3 are not related to Weka’s KnowledgeFlow. Instead, they were inspired by the ADAMS workflow engine. It is a very simple workflow, aimed at automating tasks and easy to extend as well. Instead of linking operators with explicit connections, this flow uses a tree structure for implicitly defining how the data is processed.
A workflow component is called an actor. All actors are derived from the Actor class, but there are four different kinds of actors present:
Data itself is being passed around in Token containers.
Due to the limitation of the tree structure of providing only 1-to-n connections, objects can be stored internally in a flow using a simple dictionary (internal storage). Special actors store, retrieve, update and delete these objects.
For finding out more about a specific actor, and what parameters it offers (via the config dictionary property), you use one of the following actor methods:
Printing the layout of a flow is very simple. Assuming you have a flow variable called myflow, you simply use the tree method to output the structure: print(myflow.tree)
All actors can return and restore from JSON as well, simply use the following property to access or set the JSON representation: json
The typical life-cycle of a flow (actually any actor) can be described through the following method calls:
The following source actors are available:
The following transformers are available:
The following sinks are available:
The following control actors define how data is getting passed around in a workflow:
The following conversion schemes can be used in conjunction with the Convert transformer:
Check out the examples available through the python-weka-wrapper3-examples project on Github:
The example scripts are located in the src/wekaexamples/flow sub-directory.
Below is a code snippet for building a flow that cross-validates a classifier on a dataset and outputs the evaluation summary and the ROC and PRC curves:
from weka.classifiers import Classifier
from weka.flow.control import Flow, Branch, Sequence
from weka.flow.source import FileSupplier
from weka.flow.transformer import LoadDataset, ClassSelector, CrossValidate, EvaluationSummary
from weka.flow.sink import Console, ClassifierErrors, ROC, PRC
flow = Flow(name="cross-validate classifier")
filesupplier = FileSupplier()
filesupplier.config["files"] = ["/some/where/iris.arff"]
flow.actors.append(filesupplier)
loaddataset = LoadDataset()
flow.actors.append(loaddataset)
select = ClassSelector()
select.config["index"] = "last"
flow.actors.append(select)
cv = CrossValidate()
cv.config["setup"] = Classifier(classname="weka.classifiers.trees.J48")
flow.actors.append(cv)
branch = Branch()
flow.actors.append(branch)
seqsum = Sequence()
seqsum.name = "summary"
branch.actors.append(seqsum)
summary = EvaluationSummary()
summary.config["title"] = "=== J48/iris ==="
summary.config["complexity"] = False
summary.config["matrix"] = True
seqsum.actors.append(summary)
console = Console()
seqsum.actors.append(console)
seqerr = Sequence()
seqerr.name = "errors"
branch.actors.append(seqerr)
errors = ClassifierErrors()
errors.config["wait"] = False
seqerr.actors.append(errors)
seqroc = Sequence()
seqroc.name = "roc"
branch.actors.append(seqroc)
roc = ROC()
roc.config["wait"] = False
roc.config["class_index"] = [0, 1, 2]
seqroc.actors.append(roc)
seqprc = Sequence()
seqprc.name = "prc"
branch.actors.append(seqprc)
prc = PRC()
prc.config["wait"] = True
prc.config["class_index"] = [0, 1, 2]
seqprc.actors.append(prc)
# run the flow
msg = flow.setup()
if msg is None:
msg = flow.execute()
if msg is not None:
print("Error executing flow:\n" + msg)
else:
print("Error setting up flow:\n" + msg)
flow.wrapup()
flow.cleanup()
With the following command you can output the built flow tree:
print(flow.tree)
The above example gets printed like this:
Flow 'cross-validate classifier'
|-FileSupplier [files: 1]
|-LoadDataset [incremental: False, custom: False, loader: weka.core.converters.ArffLoader]
|-ClassSelector [index: last]
|-CrossValidate [setup: weka.classifiers.trees.J48 -C 0.25 -M 2, folds: 10]
|-Branch
| |-Sequence 'summary'
| | |-EvaluationSummary [title: === J48/iris ===, complexity: False, matrix: True]
| | |-Console [prefix: '']
| |-Sequence 'errors'
| | |-ClassifierErrors [absolute: True, title: None, outfile: None, wait: False]
| |-Sequence 'roc'
| | |-ROC [classes: [0, 1, 2], title: None, outfile: None, wait: False]
| |-Sequence 'prc'
| | |-PRC [classes: [0, 1, 2], title: None, outfile: None, wait: True]
Adding additional flow components is quite easy:
- source – weka.flow.source.Source
- transformer – weka.flow.transformer.Transformer
- sink – weka.flow.sink.Sink