Python utilities for DL-MONTE

Authors: Kevin Stratford, Tom Underwood

Introduction

Normally solving a given problem using molecular simulation is more complex than simply performing a single simulation and analysing its output. Typically workflows must be employed which involve cycles of running one or more simulations, analysing their output, and then using the results of this analysis to inform input parameters for further simulations. A 'toolkit' of helper utilities for performing such tasks is thus desirable. With this in mind, we have developed a Python toolkit to support DL_MONTE (with the intention of eventually extending its scope beyond DL_MONTE).

This tutorial describes some utilities which help to read, manipulate, and write the inputs and outputs associated with DL_MONTE. Moreover it describes how to execute DL_MONTE from Python. The toolkit is named htk (Histogram reweighting toolkit - 'histogram reweighting' being one of the key functionalities of the toolkit). To elaborate, this tutorial covers:

  1. Reading inputs (CONFIG, CONTROL and FIELD files)
  2. Reading output files into python (PTFILE and YAMLDATA files)
  3. Running the DL-MONTE executable from python

Set-up

Python version

Please note this notebook uses print() and so requires Python 3 (usually displayed at the top right of the notebook). You can also check the python version with:

In [1]:
import sys
print(sys.version_info)
sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)

Adding the toolkit to PYTHONPATH

PYTHONPATH is the shell variable which contains a list of directories in which to look for packages to be imported. To use the toolkit, it is prudent to set PYTHONPATH to the directory containing the toolkit.

One can modify PYTHONPATH to include the directory containing the toolkit (which will depend on your local system!) within Python as follows:

In [2]:
# Import standard os module for file manipulations
import os

# You will need to set DL_MONTE_HOME appropriately for
# the local system
DL_MONTE_HOME = "/home/tom/Work/Code_workspace/DL_MONTE_2_dev/dlmonte2"

# Set PYTHONPATH to the 'htk' directory within DL_MONTE_HOME
sys.path.append( os.path.join(DL_MONTE_HOME,"htk") )

Alternatively one can modify PYTHONPATH directly in the shell via:

bash> cd $DL_MONTE_HOME/htk
bash> export PYTHONPATH=$PYTHONPATH:`pwd`

where here $DL_MONTE_HOME is the main directory containing DL_MONTE. This is the prefered method because it does not rely on PYTHONPATH being known by the script importing the toolkit.

Within Python the toolkit is imported as follows:

Importing the dlmonte module

The functionality we will examine here is contained in the dlmonte module of the toolkit. This is imported as follows:

In [3]:
# Import the dlmonte module with alias "dlmonte"
import htk.sources.dlmonte as dlmonte

Setting the directory contining tutorial input files

In this tutorial we will use some pre-existing DL_MONTE input files in order to demonstrate the functionality of the dlmonte module. These files are distributed with the tutorial, and pertain to a short grand-canonical Monte Carlo simulation of a Lennard-Jones fluid near the critical point. The directory containing these files is util-dlmonte_files. We will store this directory in the variable input_dir, which may need to be set appropriately for your local system:

In [4]:
# You may need to set input_dir appropriately for the local system
input_dir = "util-dlmonte_files"

Reading and writing input files

We will now examine the dlmonte module's functionality regarding importing and exporting the three key DL_MONTE input files:

  1. FIELD
  2. CONFIG
  3. CONTROL

The input files we will use as an example are contained in input_dir

Reading a FIELD file

The dlmonte.dlfield module provides a method from_file() which loads a FIELD file into an internal python structure (an instance of class FIELD).

The method takes the path of the FIELD file as the argument.

In this case, as mentioned above, the input file corresponds to a small grand canonical simulation using Lennard-Jones particles (taken from the DL_MONTE test suite).

In [5]:
filename = os.path.join(input_dir, "FIELD")

field = dlmonte.dlfield.from_file(filename)

The python FIELD class and its attributes

We can now examine the contents of the dlfield structure.

Using the builtin repr() method shows the internal representation of the whole FIELD structure:

In [6]:
repr(field)
Out[6]:
"FIELD(description='Lennard-Jones, 2.5*sigma cut-off, sigma = 1 angstrom, epsilon = 1eV', cutoff=2.5, units='ev', nconfigs=1, atomtypes=[AtomType(name= 'LJ', type= 'core', mass= 1.0, charge= 0.0)], moltypes=[MolType(name='lj', nmaxatom=1, atoms=[Atom(name= 'LJ', type= 'core', rpos= [0.0, 0.0, 0.0])], bonds=[], exc_coul_ints=False, rigid= False)], vdw=[VDW(atom1=Atom(name= 'LJ', type= 'core'), atom2=Atom(name= 'LJ', type= 'core'), interaction=Interaction(key='lj', type='Lennard-Jones', epsilon=1.0, sigma=1.0))], bonds2body=[])"

The description, cutoff, and units attributes are of type string, integer, and string, respectively.

In [7]:
print("Description: ", field.description)
print("Cutoff:      ", field.cutoff)
print("Units:       ", field.units)
('Description: ', 'Lennard-Jones, 2.5*sigma cut-off, sigma = 1 angstrom, epsilon = 1eV')
('Cutoff:      ', 2.5)
('Units:       ', 'ev')

Other attributes, such as atomtypes are more complex; atomtypes is a list of entries of class AtomType. In this example, there is only one atom type present in the simulation:

In [8]:
print ("Atomtypes: ", field.atomtypes)
('Atomtypes: ', [AtomType(name= 'LJ', type= 'core', mass= 1.0, charge= 0.0)])

Non-bonded interations are stored in the vdw attribute (again a list), which provides a full description of the interaction:

In [9]:
print (repr(field.vdw[0].atom1))
print (repr(field.vdw[0].atom2))
print (repr(field.vdw[0].interaction))
Atom(name= 'LJ', type= 'core')
Atom(name= 'LJ', type= 'core')
Interaction(key='lj', type='Lennard-Jones', epsilon=1.0, sigma=1.0)

Direct access to numerical values for computation is available, e.g.,

In [10]:
ljinteraction = field.vdw[0].interaction

print ("Twice epsilon is ", 2.0*ljinteraction.epsilon)
('Twice epsilon is ', 2.0)

Writing FIELD file output

The dlfield module also allows you to write a FIELD file in a well-formed format for use by the main DL-MONTE executable.

The is done via the str() method of the FIELD object (i.e., the output of print or the output of a format statement"{!s}".format(...):

In [11]:
print(field)
Lennard-Jones, 2.5*sigma cut-off, sigma = 1 angstrom, epsilon = 1eV
CUTOFF 2.5
UNITS ev
NCONFIGS 1
ATOMS 1
LJ core 1.0 0.0
MOLTYPES 1
lj
ATOMS 1 1
LJ core 0.0 0.0 0.0
FINISH
VDW 1
LJ core LJ core lj 1.0 1.0
CLOSE

This allows us, if required, to manipulate and write a new FIELD file with updated parameters. For example, to adjust the potential interaction, we could write:

In [12]:
field.vdw[0].interaction.epsilon = 2.0

print(field)
Lennard-Jones, 2.5*sigma cut-off, sigma = 1 angstrom, epsilon = 1eV
CUTOFF 2.5
UNITS ev
NCONFIGS 1
ATOMS 1
LJ core 1.0 0.0
MOLTYPES 1
lj
ATOMS 1 1
LJ core 0.0 0.0 0.0
FINISH
VDW 1
LJ core LJ core lj 2.0 1.0
CLOSE

Note that the 'epsilon' parameter for the Van der Waals Lennard-Jones interaction has been modified.

The internal python representation is flexible enough that additional output formats are constructed relatively easily. For example, JSON output is available via:

In [13]:
print(field.to_json())
{
  "DESCRIPTION": "Lennard-Jones, 2.5*sigma cut-off, sigma = 1 angstrom, epsilon = 1eV", 
  "CUTOFF": 2.5, 
  "UNITS": "ev", 
  "NCONFIGS": 1, 
  "ATOMTYPES": [
    {
      "NAME": "LJ", 
      "TYPE": "core", 
      "MASS": 1.0, 
      "CHARGE": 0.0
    }
  ], 
  "MOLTYPES": [
    {
      "NAME": "lj", 
      "MAXATOM": 1, 
      "ATOMS": [
        {
          "NAME": "LJ", 
          "TYPE": "core", 
          "RELPOS": [
            0.0, 
            0.0, 
            0.0
          ]
        }
      ], 
      "BONDS": [], 
      "EXCLUDE": false, 
      "RIGID": false
    }
  ], 
  "VDW": [
    {
      "ATOM1": "LJ", 
      "TYPE1": "core", 
      "ATOM2": "LJ", 
      "TYPE2": "core", 
      "INTERACTION": {
        "KEY": "lj", 
        "EPSILON": 2.0, 
        "SIGMA": 1.0
      }
    }
  ], 
  "BONDS": []
}

Reading and writing a CONFIG file

Likewise, a dlmonte.dlconfig.from_file() method is available which reads the contents of the CONFIG file into an internal represetation.

As this is related to the FIELD description, the FIELD object can, optionally, be passed as an argument. If the FIELD reference is provided, the two files can be checked for consistency.

However, the CONFIG file can be read independently. E.g.,

In [14]:
# The CONFIG will be from `input_dir`

filename = os.path.join(input_dir, "CONFIG")
config = dlmonte.dlconfig.from_file(filename)

The internal representation of the CONFIG object is:

In [15]:
repr(config)
Out[15]:
"CONFIG(title= 'Lennard-Jones starting configuration rho = 0.125; particles are molecules, not atoms', level= 0, dlformat= 1, vcell= [[10.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 10.0]], nummol= [8, 1000])"

Again, the config structure has a number of attributes. Of particular interest (especially for NVT and muVT ensembles) is the vcell attribute which specifies, indirectly, the volume of the system.

A utility method if provided to return the volume of the cell.

In [16]:
print ("Cell vectors: ", config.vcell)
print ("Cell volume:  ", config.volume())
('Cell vectors: ', [[10.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 10.0]])
('Cell volume:  ', 1000.0)

Note that the internal representation produced via repr() does not show all the molecule information, as this may be very long.

However, the string formating option can be used to generate output which is a well-formed CONFIG file with full information.

In [17]:
print (config)
Lennard-Jones starting configuration rho = 0.125; particles are molecules, not atoms
0 1
10.0 0.0 0.0
0.0 10.0 0.0
0.0 0.0 10.0
NUMMOL 8 1000
MOLECULE lj 1 1
LJ core
-5.0000000000000000 -5.0000000000000000 -5.0000000000000000 0
MOLECULE lj 1 1
LJ core
0.0000000000000000 -5.0000000000000000 -5.0000000000000000 0
MOLECULE lj 1 1
LJ core
-5.0000000000000000 0.0000000000000000 -5.0000000000000000 0
MOLECULE lj 1 1
LJ core
0.0000000000000000 0.0000000000000000 -5.0000000000000000 0
MOLECULE lj 1 1
LJ core
-5.0000000000000000 -5.0000000000000000 0.0000000000000000 0
MOLECULE lj 1 1
LJ core
0.0000000000000000 -5.0000000000000000 0.0000000000000000 0
MOLECULE lj 1 1
LJ core
-5.0000000000000000 0.0000000000000000 0.0000000000000000 0
MOLECULE lj 1 1
LJ core
0.0000000000000000 0.0000000000000000 0.0000000000000000 0

Reading and writing a CONTROL file

Simulation execution and additional parameters are determined by the CONTROL file.

In [18]:
# Again using the input_dir location as defined above

filename = os.path.join(input_dir, "CONTROL")
ctrl = dlmonte.dlcontrol.from_file(filename)

The CONTROL file has a potentially complex structure which is split into three basic parts: a title, a 'use' block, and a 'main' block. The first is just the title string:

In [19]:
repr(ctrl.title)
Out[19]:
"'GCMC simulation at Lennard-Jones critical point - simulation should be identical to Fig. 1 in N. B. Wilding, PRE 52 602 (1995)'"

The second part is the use block, which contains relevant 'use' statements, and any FED block (see the manual for details). In this case, these are both empty:

In [20]:
repr(ctrl.use_block)
Out[20]:
'UseBlock(use_statements= OrderedDict(), fed_block= None)'

The third part is the main section, which contains a series of statements controlling various simulation behaviour:

In [21]:
repr(ctrl.main_block)
Out[21]:
"MainBlock(statements= OrderedDict([('seeds', OrderedDict([('seed0', 12), ('seed1', 34), ('seed2', 56), ('seed3', 78)])), ('temperature', 13754.88), ('steps', 10000), ('equilibration', 100000), ('print', 1000), ('stack', 1000)]), moves= [InsertMoleculeMove(pfreq= 100, rmin= 0.7, movers= [{'molpot': 0.06177, 'id': 'lj'}])], samples= OrderedDict())"

Again, the string output is designed to provide a well-form CONTROL file suitable for use with the DL_MONTE executable:

In [22]:
print (ctrl)
GCMC simulation at Lennard-Jones critical point - simulation should be identical to Fig. 1 in N. B. Wilding, PRE 52 602 (1995)
finish use-block
seeds 12 34 56 78
temperature 13754.88
steps 10000
equilibration 100000
print 1000
stack 1000
move gcinsertmol 1 100 0.7
lj 0.06177
start simulation

Manipulating FIELD, CONFIG and CONTROL in one go

It is convenient to read and store the three mandatory input files in a single step. A container class DLMonteInput is provided to do this:

In [23]:
# Read all input files in input_dir into a DLMonteInput object
inputs = dlmonte.DLMonteInput.from_directory(input_dir)

This reads the three files with standard filenames FIELD, CONFIG and CONTROL from the named directory.

The result contains field, config, and control objects as attributes.

In [24]:
print (repr(inputs.field))
print ()
print (repr(inputs.config))
print ()
print (repr(inputs.control))
FIELD(description='Lennard-Jones, 2.5*sigma cut-off, sigma = 1 angstrom, epsilon = 1eV', cutoff=2.5, units='ev', nconfigs=1, atomtypes=[AtomType(name= 'LJ', type= 'core', mass= 1.0, charge= 0.0)], moltypes=[MolType(name='lj', nmaxatom=1, atoms=[Atom(name= 'LJ', type= 'core', rpos= [0.0, 0.0, 0.0])], bonds=[], exc_coul_ints=False, rigid= False)], vdw=[VDW(atom1=Atom(name= 'LJ', type= 'core'), atom2=Atom(name= 'LJ', type= 'core'), interaction=Interaction(key='lj', type='Lennard-Jones', epsilon=1.0, sigma=1.0))], bonds2body=[])
()
CONFIG(title= 'Lennard-Jones starting configuration rho = 0.125; particles are molecules, not atoms', level= 0, dlformat= 1, vcell= [[10.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 10.0]], nummol= [8, 1000])
()
CONTROL(title= 'GCMC simulation at Lennard-Jones critical point - simulation should be identical to Fig. 1 in N. B. Wilding, PRE 52 602 (1995)', use_block= UseBlock(use_statements= OrderedDict(), fed_block= None), main_block= MainBlock(statements= OrderedDict([('seeds', OrderedDict([('seed0', 12), ('seed1', 34), ('seed2', 56), ('seed3', 78)])), ('temperature', 13754.88), ('steps', 10000), ('equilibration', 100000), ('print', 1000), ('stack', 1000)]), moves= [InsertMoleculeMove(pfreq= 100, rmin= 0.7, movers= [{'molpot': 0.06177, 'id': 'lj'}])], samples= OrderedDict()))




Running a DL_MONTE simulation from python

We now have python representations of the FIELD, CONFIG, and CONTROL files required for a DL_MONTE simulation.

This section discusses running DL_MONTE via python.

STEP 1: Locate exectable and input files, and amend input files

In [25]:
# We need an executable. 
# We assume here it is in the DL_MONTE_HOME directory (see above), and is from a serial compilation

dlx = os.path.join(DL_MONTE_HOME, "bin", "DLMONTE-SRL.X")

# For input again use that from input_dir defined above

myinput = dlmonte.DLMonteInput.from_directory(input_dir)


# We also amend the CONTROL file here...
# Update the main block of the CONTROL file to include YAML output directive (see manual)

yamltag={"yamldata": 1000}
myinput.control.main_block.statements.update(yamltag)

STEP 2: Copy the input to a working directory

We assume we need to copy the input, without manipulation, to a working directory where the run will take place (and where output will be produced).

In [26]:
# Set an appropriate working directory
# THIS MUST BE CREATED ON YOUR LOCAL SYSTEM: E.G. mkdir util-dlmonte_workspace

work_dir = "util-dlmonte_workspace"

# Copy the input to the working directory

myinput.to_directory(work_dir)

STEP 3: Create a DLMonteRunner object, and execute DL_MONTE

This is a utility to help run the DL MONTE executable in the working directory via a sub-process.

In [27]:
# Set up a DLMonteRunner object linked to the directory work_dir
# and the executable dlx
myrun = dlmonte.DLMonteRunner(dlx, work_dir)

# Execute the runner - the output files from the simulation will be in work_dir
myrun.execute()

STEP 4: Examine results

On successful execution, the DLMonteRunner creates a DLMonteOutput object which contains any PTFILE and/or YAMLDATA output.

In [28]:
# We will look at the YAML-format data output by the simulation, which is stored in the
# output file YAMLDATA, and within the DLMonteRunner object as follows:
data = myrun.output.yamldata.data

# Print each frame of YAML-format data, where each frame corresponds to a certain timestep 
# ('timestamp') in the simulation. Note that data was only output every 100 moves
for step in data:
    print (step)
{'energyvdw': -59.6825112314058, 'timestamp': 1000, 'energy': -59.6825112314058, 'nmol': [93]}
{'energyvdw': -94.0929380015937, 'timestamp': 2000, 'energy': -94.0929380015937, 'nmol': [113]}
{'energyvdw': -109.12311614983, 'timestamp': 3000, 'energy': -109.12311614983, 'nmol': [125]}
{'energyvdw': -117.449863365087, 'timestamp': 4000, 'energy': -117.449863365087, 'nmol': [123]}
{'energyvdw': -115.518003788093, 'timestamp': 5000, 'energy': -115.518003788093, 'nmol': [111]}
{'energyvdw': -121.533867800447, 'timestamp': 6000, 'energy': -121.533867800447, 'nmol': [125]}
{'energyvdw': -157.967074375851, 'timestamp': 7000, 'energy': -157.967074375851, 'nmol': [148]}
{'energyvdw': -136.35351690988, 'timestamp': 8000, 'energy': -136.35351690988, 'nmol': [139]}
{'energyvdw': -97.6479191153069, 'timestamp': 9000, 'energy': -97.6479191153069, 'nmol': [114]}
{'energyvdw': -91.0264487309646, 'timestamp': 10000, 'energy': -91.0264487309646, 'nmol': [108]}

STEP 5: Clean up

To remove input, output, or both, methods on the DLMonteRunner class are provided:

In [29]:
# Remove input or output
myrun.remove_input()
myrun.remove_output()

# Or, remove both input and output
myrun.cleanup()