The information in a Request to a Seal application originally comes from an HTTP request. An HTTP request consists of:
Method | GET and POST are currently handled |
URL | |
Headers | E.g., content-type, content-length |
Body | For POST requests |
The body is consumed in the process of digesting the HTTP request, but it is preserved in digested form.
A URL has internal structure. Consider the following example:
https://abney@foo.com:8000/cgi-bin/app/foo.2.5/edit.1?x=0&y=42
It breaks into a number of pieces:
Scheme | https |
User | abney |
Host | foo.com |
Port | 8000 |
External Pathname | /cgi-bin/app/foo.2.5/edit.1 |
Query String | x=0&y=42 |
When the server processes the incoming HTTP request, it splits the external pathname into two pieces: the root prefix is the portion that addresses the CGI script, and the internal pathname is the remainder. The dividing slash is assigned to the internal pathname, with the result that the internal pathname has the form of an absolute pathname. Conceptually, the internal pathname is an address within the application's web space.
In our example URL, the external pathname subdivides into:
Root prefix | /cgi-bin/app |
Internal pathname | /foo.2.5/edit.1 |
When an application runs under a web server, whether it is Apache or the python web server, the application is invoked within a CGI script, and the server uses environment variables to pass the HTTP request to the CGI script. In particular, each URL piece just mentioned is assigned to a separate environment variable. Let us call the collection of environment variables containing the pieces of an HTTP request the CGI environment.
The convenience function make_environ() can be used to create a CGI environment. It is not used in normal processing but can be useful for testing or illustration.
>>> from seal.app.env import make_environ >>> cgienv = make_environ(rootprefix='/cgi-bin/app', ... path='/foo.2.5/edit.1', ... qs='x=0&y=42', ... user='abney') ... >>> for key in sorted(cgienv): ... print(repr(key), repr(cgienv[key])) ... 'HTTPS' 'off' 'PATH_INFO' '/foo.2.5/edit.1' 'QUERY_STRING' 'x=0&y=42' 'REQUEST_METHOD' 'GET' 'SCRIPT_NAME' '/cgi-bin/app' 'USER' 'abney'
Within Python, the CGI environment is represented as a dict-like object. Seal makes use of the values for the following keys:
The Request constructor takes a CGI environment as argument, but it digests it into a more convenient internal form, that I call the digested environment. The conversion is done by the function digest_environ() of seal.app.env. The digest environment is a dict that contains the following keys:
Continuing our example:
>>> from seal.app.config import Config >>> config = Config() >>> from seal.app.env import digest_environ >>> environ = digest_environ(cgienv, config) >>> for key in sorted(environ): ... if key != 'original': ... print(repr(key), repr(environ[key])) ... 'client_addr' None 'cookie' {} 'form' {'x': '0', 'y': '42'} 'https_on' False 'pathname' '/foo.2.5/edit.1' 'rootprefix' '/cgi-bin/app' 'user' 'abney'
In the example, I skip the value for 'original' because it is the same as the value of cgienv.
Note that digest_environ() is called by the Request constructor; users generally have no need to call it directly.
The sole argument to a Seal application function is a Request, and the return value is a Response. There are no side channels between browser and application, hence all required information must be packaged into Request and Response. In particular, cookies used to maintain state must be included in the Request and Response.
The Request constructor takes two arguments: a CGI environment and a Resources instance.
A Request has the following members:
resources | the Resources instance given to the Request constructor. |
config | a Config instance, taken from resources. |
log | a Logger instance, taken from resources. |
server | a Server instance, taken from resources. |
authenticator | an Authenticator instance, created when one calls authenticate(). |
webenv | the digested environment, returned by digest_environ(). |
path | a tuple of URLPathComponent instances, created from webenv. |
username | the authenticated user name, or '' if no username is provided or authentication fails. |
root | an HttpDirectory instance representing the root web directory. Initially it is None, but it is set by App. |
file | the application file. Initially it is None, but it is set by App. |
An application generally uses script-internal pathnames to represent locations, inasmuch as internal pathnames are not affected if the script is moved or renamed.
However, filenames that occur in URLs, particularly in URLs appearing in links on web pages, must be full external pathnames. As long as we use relative pathnames, no problem arises. However, if we use an absolute pathname like /.lib/default.css, it will cause the browser to request an invalid location: the browser must instead request /cgi-bin/app/.lib/default.css. That is, before including an absolute pathname in a web page, we must convert it to external form by prepending the script location.
Some detailed issues regarding slashes introduce further complexities. If a browser requests /foo/bar and the returned page contains a link to the relative path baz, the browser interprets it as /foo/baz, whereas if the the browser requests /foo/bar/, then baz is interpreted as /foo/bar/baz. That is, the interpretation of a link depends on the presence or absence of a trailing slash in the URL that the browser used to request the page.
A Request is careful to preserve the ambiguity, to allow the application to deal with it appropriately. Leading and trailing slashes are never deleted. Rather, the URL path is split at slashes, yielding a list of path components. For example, the path /foo/bar is interpreted as ('', 'foo', 'bar'), whereas /foo/bar/ is interpreted as ('', 'foo', 'bar', '').
Strictly speaking, a Request should address a page, not a directory, since only a page can be returned as an HTTP response. The Request itself cannot determine whether the path addresses a page or a directory; that is the responsibility of the application. The App class deals with a request for a directory by sending the browser a redirect to the directory's home page, whose name is the empty string. That is, the redirect adds a trailing slash.
The empty-string component at the beginning of the path corresponds to the root directory. An empty-string path has a single empty-string component, which addresses the root directory itself. The path / corresponds to components ('', ''), which address, not the root directory, but the home page of the root directory.
Note that one should not use os.path.join with URL pathnames. Usually it introduces a slash between its arguments, but not if the leading argument is the empty string:
>>> import os >>> os.path.join('foo', 'bar') 'foo/bar' >>> os.path.join('foo', '') 'foo/' >>> os.path.join('', 'foo') 'foo'
The result we desire is /foo, not foo.
A pathname component is represented by the class URLPathComponent. It is a specialization of str, but it also has a record of the full external pathname corresponding to the component. One may use a URLPathComponent's join method to extend the path, instead of using os.path.join.
In the example introduced above, the request's path consists of three components:
Cpt | Pathname | |
---|---|---|
path[0] | '/cgi-bin/app' | '/cgi-bin/app' |
path[1] | 'foo.2.5' | '/cgi-bin/app/foo.2.5' |
path[2] | 'edit.1' | '/cgi-bin/app/foo.2.5/edit.1' |
The first component represents the root; its pathname is the script location. Each subsequent pathname is obtained by adding a slash and the next component's string value.
The Request constructor calls path_from_env() to convert the digested environment into a path. To continue our previous example for the sake of illustration:
>>> from seal.app.request import path_from_env >>> path = path_from_env(environ) >>> for (i, cpt) in enumerate(path): ... print('[%d]' % i, repr(cpt), repr(cpt.pathname)) ... [0] '/cgi-bin/app' '/cgi-bin/app' [1] 'foo.2.5' '/cgi-bin/app/foo.2.5' [2] 'edit.1' '/cgi-bin/app/foo.2.5/edit.1'
A form is a set of key-value assignments. Where it comes from depends on the HTTP request method. In the case of a GET request, the form comes from the query string in the URL, and in the case of a POST request, the form comes from the body of the HTTP request.
The form is translated to a dict of keyword arguments attached to the final URLPathComponent. For example, the final URLPathComponent generated from the URL '/foo.2/edit.1?x=hi&y=there' has the form dict:
{'x': 'hi', 'y': 'there'}
There is one nonstandard aspect to my treatment of form information. I permit variable names to be prefixed with an asterisk, making them list-valued. For example, the query string '*x=2&*x=5&*y=hi&z=lo' produces the form dict:
{'x': ['2', '5'], 'y': ['hi'], 'z': 'lo'}
A path component is parsed into a call by splitting it at dots. The first element is the component name, and the remaining elements are positional arguments. The call also contains a keyword-arguments dict. For the last component, it consists of the form information, and for other components, it is an empty dict.
For example:
>>> for (i, cpt) in enumerate(path): ... print('[%d]' % i, cpt.call) ... [0] None [1] ('foo', ('2', '5'), {}) [2] ('edit', ('1',), {'x': '0', 'y': '42'})
There is no call for the first path component. The first component is associated with the root directory, and each call addresses a child (subdirectory or page) of the previous component.
In addition to the path and form, the request extracts two further pieces of information from the URL:
Two further pieces of information are included in an HTTP request, but are not part of the URL:
A Response packages up the information needed to produce an HTTP response. There are two cases: regular responses and redirect responses.
A regular response is created by providing the contents and optionally a code, content_type, and authenticator.
A redirect is created by providing code=303, in which case one must also provide the keyword argument location to specify which URI to redirect to. No other arguments are permitted.
The following table lists the HTTP status codes that are currently used, along with the corresponding messages:
Code | Message |
---|---|
200 | OK |
303 | See Other |
400 | Bad Request |
404 | Not Found |
500 | Internal Server Error |
The following table lists the currently recognized filename suffixes, along with the corresponding Mime type and character encoding. An encoding of None indicates binary data.
Suffix | Mime Type | Encoding |
---|---|---|
css | text/css | us-ascii |
gl | text/x-glab | utf-8 |
html | text/html | utf-8 |
js | application/javascript | us-ascii |
text/pdf | None | |
txt | text/plain | utf-8 |
wav | audio/wave | None |
The locus of authentication is the class Authenticator (seal.app.auth). Authentication is done separately for each Request; an Authenticator is instantiated when one calls the request's authenticate method. Request.authenticate dispatches to Authenticator.authenticate, and the result is a username, which is stored both in the Authenticator and in the Request. On authentication failure, the username is the empty string.
The application function may interact with the authenticator by calling the following methods of Request:
Those methods of Request hand off to methods of Authenticator, listed below.
Changes in the username must be passed back to the client in the form of a cookie. That happens in the method Response.http_headers, which calls Authenticator.response_headers() and includes the resulting headers (if any) among the headers that are passed back to the client.
There is one last connection needed to close the loop. When creating the Response, one must pass the Authenticator to the Response constructor. If one defines the application function using the App framework described below (Chapters 13-16), a web page is a specialization of Item, and maintains an internal pointer to the Request in its context member. The Response constructor is called in the Item method to_response, which takes the Authenticator from the request and passes it to the Response constructor. The to_response method is called in App.__call__.
The members and methods of Authenticator are:
The auth script is used to manage authentication files. There are two files that the authenticator makes use of, users.txt and sessions.txt, both located in the directory config['auth_dir'].
The auth script assumes that the current working directory is the authentication directory, and it uses or modifies ./users.txt and ./sessions.txt. The following provide examples of usage:
$ auth ls # lists the users $ auth set uname # prompts for password, saves it $ auth check uname # prompts for password, checks it $ auth delete uname