In this section we describe the ways in which Jsonnet aids with the configuration of the Fractal application. These are the three main productivity benefits when using Jsonnet:
The configuration is logically separated into several files via import
constructs.
This is to promote abstraction and re-usability (some of our templates could be used by other
applications, so are separated into libraries), as well as having a separate credentials file (to
avoid accidental checkin). Note that Jsonnet did not mandate this file structure; other separations
(or a giant single file) would also have been possible.
When the imports are realized, the result is a single configuration which yields a JSON packer configuration for each image (*.packer.json) and the JSON Terraform configuration (terraform.tf), using multiple file output. Those configurations in turn embed other configurations for the application software that will run on the instances. The top-level structure of the generated configuration is therefore as follows (with the content of each file elided for clarity).
{
"appserv.packer.json": ...,
"cassandra.packer.json": ...,
"tilegen.packer.json": ...,
"terraform.tf": ...
}
The configuration is built from the following input files:
Note that the Jsonnet template libraries also include some definitions not used in this example application, e.g., PostgreSQL and MySQL templates. Those can be ignored.
To integrate Jsonnet with Packer and Terraform, a Makefile is used. This first runs Jsonnet on the configuration and then runs Packer / Terraform on the resulting files (if they have changed). The choice of 'glue' tool is arbitrary, we could also have used a small Python script. We chose make because it is well understood, available everywhere, and does three things that we want: Invoke other programs, run things in parallel, and avoid repeating work that is already complete. For example, Packer is only invoked if its configuration file has changed, and the three image builds will proceed in parallel (they take a few minutes each).
Ignoring the re-usable templates, whitespace, and comments, the Jsonnet configuration is 217 lines (9.7kB). The generated Packer and Terraform files are 740 lines (25kB) in total. This demonstrates the productivity benefits of using template expansion when writing configurations.
The fractal example is a complete realistic application and therefore its configuration has many technical details. In particular, it embeds configurations for various pieces of off-the-shelf software that we don't want to go into in much depth. However we would like to draw attention to some particular uses of Jsonnet within the configuration. We'll gladly field specific questions on the mailing list.
Each Packer configuration is, at its core, a list of imperative actions to be performed in sequence by a VM, after which the disk is frozen to create the desired image. The actions are called provisioners. Jsonnet is used to simplify the provisioner list by eliminating duplication and encouraging abstraction. Generic templates for specific off-the-shelf software are defined in re-usable libraries, which are then referenced from service.jsonnet and, as needed, overridden with some fractal application details. Generic provisioners are also provided for easy installation of packages via package managers, creation of specific files / dirs, etc.
In addition, the ImageMixin
object in service.jsonnet is used to factor out
common fractal-specific configuration from the three images. Note that it is a variable, so will
not appear in the output. Factored out image config includes the Google Cloud Platform project id
and filename of the service account key. Since all the images are derived ultimately from the
GcpDebian
image (in lib/packer.jsonnet), and this image includes apt
& pip provisioners (discussed shortly), this is also a good place to ensure some basic
packages are installed on every image.
local ImageMixin = {
project_id: credentials.project,
account_file: "service_account_key.json",
// For debugging:
local network_debug = ["traceroute", "lsof", "iptraf", "tcpdump", "host", "dnsutils"],
aptPackages +: ["vim", "git", "psmisc", "screen", "strace" ] + network_debug,
},
Both the application server's image configuration appserv.packer.json and the tile
generation service's image configuration tilegen.packer.json extend
MyFlaskImage
, which exists merely to add the aforementioned ImageMixin
to
the GcpDebianNginxUwsgiFlaskImage
template from lib/packer.jsonnet. That template
builds on the basic GcpDebianImage
template from the same library, and adds all the
packages (and default configuration) for both Nginx and uWSGI:
local MyFlaskImage = packer.GcpDebianNginxUwsgiFlaskImage + ImageMixin,
In Jsonnet we use JSON as the canonical data model and convert to other formats as needed. An
example of this is the uWSGI configuration, an INI file, which is specified in Jsonnet under the
uwsgiConf
field in GcpDebianNginxUwsgiFlaskImage
. The JSON version is
converted to INI by the call to std.manifestIni
(documented here) in the provisioner immediately below it. Representing the INI file
with the JSON object model (instead of as a string) allows elements of the uWSGI configuration (such
as the filename of the UNIX domain socket) to be easily kept in sync with other elements of the
configuration (Nginx also needs to know it). If the application is configured with JSON, or even
YAML, then no conversion is required. An example of that is the default Cassandra configuration
file held in the conf
field of lib/cassandra.jsonnet.
Looking back at appserv.packer.json and tilegen.packer.json in
service.jsonnet, while both use Nginx/uWSGI/Flask there are some subtle differences. Since
the HTTP request handlers are different in each case, the module (uWSGI entrypoint) and also
required packages are different. Firstly, the tile generation image needs a provisioner to build
the C++ code. Secondly, since the application server talks to the Cassandra database using
cassandra-driver, which interferes with pre-forking, it is necessary to override the
lazy
field of the uWSGI configuration in that image. This is an example of how an
abstract template can unify two similar parts of the configuration, while still allowing the
overriding of small details as needed. Note also that such precise manipulation of configuration
details would be much harder if the uWSGI configuration was represented as a single string instead
of as a structure within the object model.
"appserv.packer.json": MyFlaskImage {
name: "appserv-v20141222-0300",
module: "main", // Entrypoint in the Python code.
pipPackages +: ["httplib2", "cassandra-driver", "blist"],
uwsgiConf +: { lazy: "true" }, // cassandra-driver does not survive fork()
...
},
Going up to the top of the template hierarchy we have GcpDebianImage
and finally
GcpImage
in lib/packer.jsonnet. The latter gives the Packer builder configuration
for Google Cloud Platform, bringing out some fields to the top level (essentially hiding the
builder
sub-object). We can hide the builder configuration because we only ever need
one builder per image. We can support multiple cloud providers by deriving an entire new Packer
configuration at the top level, overriding as necessary to specialize for that platform. The
GcpDebianImage
selects the base image (Backports) and adds provisioners for apt and pip
packages. The configuration of those provisioners (the list of installed packages, and additional
repositories / keys) is brought out to the top level of the image configuration. By default, the
lists are empty but sub-objects can override and extend them as we saw with
appserv.packer.json.
The actual provisioners Apt
and Pip
are defined further up
lib/packer.jsonnet. In their definitions, one can see how the various shell commands are built
up from the declarative lists of repositories, package names, etc. Installing packages in a
non-interactive context requires a few extra switches and an environment variable, but the
Apt
provisioner handles all that. Also note how these provisioners both derive from
RootShell
(defined right at the top of the file) because those commands need to be run
as root.
Note that everything discussed has been plain Jsonnet code. It is possible to create your own provisioners (based on these or from scratch) in order to declaratively control packer in new ways.
The remainder of service.jsonnet generates the Terraform configuration that defines all the cloud resources that are required to run the fractal web application. That includes the instances that actually run the Packer images. It also includes the resources that configure network routing / firewalls, load balancers, etc.
Terraform accepts two basic syntaxes, JSON and HCL (a more concise form of JSON). This provides a data structure that specifies the resources required, but the model also has a few computational features: Firstly, the structure has 'variables' which can be resolved by string interpolation within the resources. Also, all resources are also extended with a count parameter for creating n replicas of that resource, and finally there is some support for importing 'modules', i.e., another Terraform configuration of resources (perhaps written by a third party).
The interpolation feature is not just for variables, it is also used to reference attributes of resources that are not known until after deployment (i.e., cannot be known during Jsonnet execution time). For example, the generated IP address of a static IP resource called foo can be referenced from a string in the definition of a resource bar using the syntax ${google_compute_address.foo.address}, which is resolved after the deployment of foo in time for the deployment of bar.
We choose to emit JSON instead of HCL, as the latter would require conversion code. We also do not make use of any of the Terraform language features, as Jsonnet provides similar or greater capabilities in each of those domains, and doing it at the Jsonnet level allows integration with the rest of the configuration. We do, however, use Terraform interpolation syntax for resolving the "not known until deployment" attributes, e.g., in order to configure the application server with the host:port endpoint of the tile processing service. Such resolution cannot be performed by Jsonnet.
Going through service.jsonnet, the function zone
is used to statically
assign zones to instances on a round robin
basis. All the instances extend from FractalInstance
, which is parameterized by the
index zone_hash
(it's actually just a function that takes zone_hash and returns an
instance template). It is this index that is used to compute the zone, as can be seen in the body
of FractalInstance
. The zone is also also a namespace for the instance name, so when
we list the instances behind each load balancer in the google_compute_target_pool
object, we compute the zone for each instance there as well.
local zone(hash) =
local arr = [
"us-central1-a",
"us-central1-b",
"us-central1-f",
];
arr[hash % std.length(arr)],
FractalInstance
also specifies some default API access scopes and tags, as well as
the network over which the instances communicate. It extends GcpInstance
from
lib/terraform.jsonnet, which brings default service account scopes, the network, and the
startup script to the top level, and provides some defaults for other parameters.
Back in service.jsonnet we now have the instance definitions themselves. These are
arranged by image into clusters: application server, db (Cassandra), and tile generation. Terraform
expects the instances in the form of a key/value map (i.e. a JSON object), since it identifies them
internally with unique string names. Thus the three clusters are each expressed as an object, and
they are joined with the object addition +
operator.
google_compute_instance: {
["appserv" + k]: ...
for k in [1, 2, 3]
} + {
db1: ...,
db2: ...,
db3: ...,
} + {
["tilegen" + k]: ...
for k in [1, 2, 3, 4]
}
The appserv and tilegen replicas are given using an object comprehension in which the field name and value are
computed with k
set to each index in the given list. The variable k
also
ends up as an argument to FractalInstance
and thus defines the zone of the instance.
In both cases, we also place a file /var/www/conf.json. This is read on the instance by
the application, at startup, and used to configure the service. In the tilegen replicas the
configuration comes from ApplicationConf
from the top of service.jsonnet. In
the appserv instances, the same data is used, but extended with some extra fields.
resource.FractalInstance(k) {
...
conf:: ApplicationConf {
database_name: cassandraKeyspace,
database_user: cassandraUser,
database_pass: credentials.cassandraUserPass,
tilegen: "${google_compute_address.tilegen.address}",
db_endpoints: cassandraNodes,
},
startup_script +: [self.addFile(self.conf, "/var/www/conf.json")],
}
In both cases, the startup
script has added an extra line appended, computed by self.addFile()
, a method
inherited from GcpInstance
in lib/terraform.jsonnet. Examining its definition
shows that it generates a line of bash to actually add the file:
addFile(v, dest)::
"echo %s > %s" % [std.escapeStringBash(v), std.escapeStringBash(dest)],
Finally, the Cassandra cluster is deployed via an explicit list of three nodes (db1, db2, db3).
We attend to them individually, firstly because bringing up a Cassandra cluster from scratch
requires one node to have a special boot strapping role, and secondly because database nodes are
stateful and therefore less 'expendable' than application or tile server nodes. All three nodes
extend from the CassandraInstance(i)
mixin, which is where they get their common
configuration. As with FractalInstance(i)
, the integer parameter is used to drive the
zone. The bootstrap behavior of the first node is enabled by extending GcpStarterMixin
instead of GcpTopUpMixin
. The starter mixin has extra logic to initialize the
database, which we pass in the form of a CQL script to create the database and
replication factor for one of its internal tables. There is some fancy footwork required to get
Cassandra into a stable state without exposing it to the network in a passwordless state. All of
that is thankfully hidden behind the two re-usable mixins in lib/cassandra.jsonnet.
Except as noted, this content is licensed under Creative Commons Attribution 2.5.