Ch 2: Running CLD

The CLD command-line executable

Quick start

The CLD command-line executable gives access to toplevel functionality. CLD can be invoked either as a desktop application or as a web application. To create an example corpus and run the desktop application, put $SEAL/bin on your PATH and do the following:

$ cld example.d create_test
$ cld example.d run

(The "run" is actually optional; it is assumed as the default action.)

The web application version is described in the next section. There are a couple of differences between the desktop application and the web application. When running on the desktop, one automatically has full permissions; the internal password authentication system is not used. One also has access to the corpus manager, which allows one to switch between corpora and create new corpora. The web version only accesses a single fixed corpus.

That being said, the differences have to do with configuration rather than code. The desktop application actually just runs the web application within a Python web server that it runs internally, effectively using a browser as its user interface.

Usage

For the sake of an integrated description, I give a full command listing here, though much of the material will only make sense after reading the rest of this volume.

The executable is a shell script located in $SEAL/bin. The following provides usage templates for each of the commands:

$ cld SPEC
$ cld SPEC auth ls|set|check|delete [USER]
$ cld SPEC call PATH [CKW*]
$ cld SPEC config
$ cld SPEC create|-c
$ cld SPEC create_cgi CGIFN [CKW*]
$ cld SPEC create_test
$ cld SPEC delete|del|-d [LANG] [ITEM*]
$ cld SPEC export|-e EFN [LANG] [ITEM*]
$ cld SPEC extract [DIR] [CKW*]
$ cld SPEC get ITEM [CKW*]
$ cld SPEC glab COM [USER*] [CKW*]
$ cld SPEC group [GRP] [COM] [USER*] [CKW*]
$ cld SPEC import|-i EFN
$ cld SPEC info [PATH]
$ cld SPEC list|-l [ITEM*]
$ cld SPEC ls [PATH]
$ cld SPEC perm [PATH] [add|remove USER ROLE]
$ cld SPEC rm [ITEM] [CKW*]
$ cld SPEC run [CKW*]
$ cld SPEC set [KV*]
$ cld SPEC tree [KW*]
$ cld SPEC unset [KEY*]
$ cld SPEC user [USER [add|remove GRP]] [CKW*]

SPEC and CKW. The usage is "object-oriented" in the sense that the SPEC is a specification for an application file, configuration, or export file, and the command is treated conceptually as a method of that file. In the general case, SPEC is a list of configuration keyword-value pairs (CKW). With some commands, additional keyword-value pairs may be added at the end of the command line, as indicated by "[CKW*]".

A keyword-value pair is a single word of form KEY=VALUE, containing a literal '=' character. The list of configuration keywords is given in the section Configuration keys in Chapter 5. In addition, some shorthands are permitted for convenience.

Commands

The individual commands are as follows.

(no command)
Same as run
auth ls|set|check|delete [USER]
USER is actually optional only for ls; it is obligatory for all other subcommands.
call PATH [KV*]
Instantiate the app and run it. First, it launches an internal Python web server that calls back to the app to handle requests. Then it packages PATH [KV*] as an HTTP request and sends it to the server. Note that, in this case only, a given key may appear multiple times, provided that the key begins with *.
config
Print out the corpus configuration file.
create
Create the named corpus directory.
create_cgi CGIFN [KW*]
Create a CGI file that uses the named corpus. CGIFN is the filename to create.
create_test
Creates a test corpus. Prepopulates it from seal/examples/corp1.ef. Also creates a media directory. If not otherwise specified, the media directory will be called 'media' in the current directory.
delete [LANG] [ITEM*]
Delete the indicated items. See Chap 18, Export files.
export EFN [LANG] [ITEM*]
Export the indicated items to the file EFN. The value '-' may be used for stdout. LANG is a subcorpus identifier, which may be a language code, 'roms', or 'glab'. ITEM is a designator for a specific item in the corpus, such as oji/1. If LANG is provided, 'oji' may be omitted. If no items are named, the entire corpus is exported. See Chap 18, Export files.
glab ls [USER*]
List the notebooks belonging to each of the named USERs. If no USER is provided, list the users. Equivalent to cld CFN ls glab or cld CFN ls glab/USER.
glab add USER*
Add libraries for each of the given USERs.
glab rm USER*
Delete the libraries of each of the given USERs. Use with caution! Cannot be undone.
import EFN
Import from EFN. See Chap 18, Export files.
info [PATH]
Print out name, file type, and permissions for the given PATH. The PATH omits filename suffixes; e.g. 'langs/oji/texts'. Any leading slash is ignored. If PATH is omitted, use the root.
list [ITEM*]
List the named items. If none are specified, list the entire corpus.
ls [PATH]
List the children of a given directory. PATH is interpreted as for 'info'.
perm [PATH] [add|remove USER ROLE]
Show or modify permissions. PATH is interpreted as for 'info'. 'add' causes the USER to be added to the ROLE for this file, and 'remove' causes the USER to be removed from the ROLE. If neither 'add' nor 'remove' are specified, the permissions are displayed. The legal values for ROLE are 'owners', 'editors', or 'shared'.
run [KW*]
Run the CLD application. This is the default, if no command is provided. See: Running CLD as an application and Running as a web service.
set [KV*]
Set values of configuration keys. KV represents a keyword-value pair of form key=value. A given key is permitted to appear multiple times, provided that it starts with '*'.
tree [KW*]
Print out the contents of the corpus in tree format. KW is a keyword argument of form key=value that controls printing.
unset [KEY*]
Unset values of configuration keys. KEY is just a keyword, without an indication of a value.
user [USER [add|remove GRP]] [CKW*]
USER and GRP are both user names, but GRP is treated as a group (a user with members).

Invoking CLD from Python

The cld executable simply executes the module seal.script.cld, and that in turn does:

$ import sys
$ from seal.cld.toplevel import CLDManager
$ mgr = CLDManager()
$ mgr.main(sys.argv)

One can execute commands directly from Python by instantiating the CLDManager, passing it the corpus filename, and calling it as a function instead of using main. For example:

$ mgr = CLDManager('/tmp/foo.cld')
$ mgr('create_test')

See CLDManager in Chapter 17 for more information.

Running as a web service

Local testing

It is recommended to create a directory just for the CLD corpus and supporting materials, outside of the Apache document directory.

$ mkdir cld
$ cd cld

Create an empty corpus:

$ cld corpus.cld create

One can do local testing first, before deploying. Run in webserver mode rather than desktop mode:

$ cld corpus.cld -w

This will start an internal Python webserver and open a browser window pointing at localhost:8000. You should get a Login page.

One cannot log in without defining users. For the sake of illustration, let us create a user named leo. Stop CLD (use ctrl-C), and do:

$ cld corpus.cld auth set leo

This creates an account for leo and prompts you for a password. Notice that the password and sessions file reside in a directory called 'auth' that is a sibling to the corpus directory. That location is a configuration setting, which you may change if you desire. To see the current configuration settings:

$ cld corpus.cld config

Now that we have created a user, let us restart the web server (cld corpus.cld -w) and log in using user name 'leo' and the password you chose.

When you do so, you get a new page, and it should indicate, in the upper right corner, that 'leo' is logged in. But it says the corpus is not readable. When a corpus is created, no permissions are automatically granted.

Stop the web server again, and make leo be an owner of the corpus:

$ cld corpus.cld perm / add leo owners

The slash indicates the root directory; leo is being added to the list of owners. Permissions are inherited, so leo will be owner of any additional subdirectories that we create, unless we explicitly remove leo from the owners list of some subdirectory.

Now restart the web server. Unless you clear your browser history, or wait long enough for the session to time out, leo will still be logged in, and you will now get a list of corpus contents.

Creating a CGI script

One can create a CGI test script that just displays environment variables. The contents of the script:

#!/Users/abney/anaconda3/bin/python

import site
site.addsitedir('/Users/abney/git/seal/python')

from seal.app.toplevel import test_app, Manager
Manager(app=test_app).cgi()

To run CLD, the CGI script should look something like the following. (The pathnames may need to be different in your environment.)

#!/usr/local/bin/python

import site
site.addsitedir('/usr/local/seal/python')

from seal.cld.toplevel import CLDManager
mgr = CLDManager('/usr/local/cld/corpus.cld',
                 auth_dir='/usr/local/cld/auth',
                 log_file='/usr/local/cld/log',
                 logging='all')
mgr.cgi()

For debugging, examine the log file. Its pathname is given in the CGI script.

Configuration

Configuration file

The configuration file may be stored in a file, or it may be provided on the command line, or as created as a dict in Python. It is passed to the App constructor.

A complete list of configuration variables is provided in the section Configuration keys. One may also wish to refer to the list of Logging conditions.

Password and session files

To enable password protection, one requires a password file and sessions file. These are plain-text files named users.txt and sessions.txt in the server_dir. They should be readable by httpd, but not world-readable. They should absolutely not be under htdocs.

The password file contains one line for each user. The fields are the user name, the salt, and the password hash value. The sessions file also contains one line for each user; its fields are user name, token, expiration, and client address.

The auth script can be used to manage them. Here are examples of the commands:

$ auth ls
$ auth set user
$ auth check user
$ auth delete user

All of the commands print out the locations of the password file and the sessions file.