gridmap Package

The most useful parts of our API are available at the package level in addition to the module level. They are documented in both places for convenience.

From job Module

class gridmap.Job(f, args, kwlist=None, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, queue='all.q')

Bases: builtins.object

Central entity that wraps a function and its data. Basically, a job consists of a function, its argument list, its keyword list and a field “ret” which is filled, when the execute method gets called.

Note

This can only be used to wrap picklable functions (i.e., those that are defined at the module or class level).

args
cause_of_death
cleanup
environment
exception
execute(self)

Executes function f with given arguments and writes return value to field ret. If an exception is encountered during execution, ret will contain a pickled version of it. Input data is removed after execution to save space.

function

Function this job will execute.

home_address
jobid
kwlist
log_stderr_fn
log_stdout_fn
mem_free
name
native_specification

define python-style getter

num_resubmits
num_slots
path
queue
ret
timestamp
uniq_id
white_list
working_dir
exception gridmap.JobException

New exception type for when one of the jobs crashed.

gridmap.process_jobs(jobs, temp_dir='/scratch/', white_list=None, quiet=True, max_processes=1, local=False)

Take a list of jobs and process them on the cluster.

Parameters:
  • jobs (list of Job) – Jobs to run.
  • temp_dir (str) – Local temporary directory for storing output for an individual job.
  • white_list (list of str) – If specified, limit nodes used to only those in list.
  • quiet (bool) – When true, do not output information about the jobs that have been submitted.
  • max_processes (int) – The maximum number of concurrent processes to use if processing jobs locally.
  • local (bool) – Should we execute the jobs locally in separate processes instead of on the the cluster?
Returns:

List of Job results

gridmap.grid_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)

Maps a function onto the cluster.

Note

This can only be used with picklable functions (i.e., those that are defined at the module or class level).

Parameters:
  • f (function) – The function to map on args_list
  • args_list (list) – List of arguments to pass to f
  • cleanup (bool) – Should we remove the stdout and stderr temporary files for each job when we’re done? (They are left in place if there’s an error.)
  • mem_free (str) – Estimate of how much memory each job will need (for scheduling). (Not currently used, because our cluster does not have that setting enabled.)
  • name (str) – Base name to give each job (will have a number add to end)
  • num_slots (int) – Number of slots each job should use.
  • temp_dir (str) – Local temporary directory for storing output for an individual job.
  • white_list (list of str) – If specified, limit nodes used to only those in list.
  • queue (str) – The SGE queue to use for scheduling.
  • quiet (bool) – When true, do not output information about the jobs that have been submitted.
Returns:

List of Job results

gridmap.pg_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)

Deprecated since version 0.9: This function has been renamed grid_map.

Parameters:
  • f (function) – The function to map on args_list
  • args_list (list) – List of arguments to pass to f
  • cleanup (bool) – Should we remove the stdout and stderr temporary files for each job when we’re done? (They are left in place if there’s an error.)
  • mem_free (str) – Estimate of how much memory each job will need (for scheduling). (Not currently used, because our cluster does not have that setting enabled.)
  • name (str) – Base name to give each job (will have a number add to end)
  • num_slots (int) – Number of slots each job should use.
  • temp_dir (str) – Local temporary directory for storing output for an individual job.
  • white_list (list of str) – If specified, limit nodes used to only those in list.
  • queue (str) – The SGE queue to use for scheduling.
  • quiet (bool) – When true, do not output information about the jobs that have been submitted.
Returns:

List of Job results

conf Module

Global settings for GridMap. All of these settings can be overridden by specifying environment variables with the same name.

author:Christian Widmer
author:Cheng Soon Ong
author:Dan Blanchard (dblanchard@ets.org)
var USE_MEM_FREE:
 Does your cluster support specifying how much memory a job will use via mem_free? (Default: False)
var DEFAULT_QUEUE:
 The default job scheduling queue to use. (Default: all.q)
var CREATE_PLOTS:
 Should we plot cpu and mem usage and send via email? (Default: True)
var USE_CHERRYPY:
 Should we start web monitoring interface? (Default: True)
var SEND_ERROR_MAILS:
 Should we send error emails? (Default: False)
var SMTP_SERVER:
 SMTP server for sending error emails.
var ERROR_MAIL_SENDER:
 Sender address to use for error emails. (Default: error@gridmap.py)
var ERROR_MAIL_RECIPIENT:
 Recipient address for error emails. (Default: $USER@$HOST, where $USER is the current user’s username, and $HOST is the last two sections of the server’s fully qualified domain name, or just the host’s name if it does not contain periods.)
var MAX_MSG_LENGTH:
 Maximum length of any error email message. (Default: 5000)
var MAX_TIME_BETWEEN_HEARTBEATS:
 How long should we wait (in seconds) for a heartbeat before we consider a job dead? (Default: 45)
var NUM_RESUBMITS:
 How many times can a particular job can die, before we give up. (Default: 3)
var CHECK_FREQUENCY:
 How many seconds pass before we check on the status of a particular job in seconds. (Default: 15)
var HEARTBEAT_FREQUENCY:
 How many seconds pass before jobs on the cluster send back heart beats to the submission host. (Default: 10)
var WEB_PORT:Port to use for CherryPy server when using web monitor. (Default: 8076)

data Module

This modules provides all of the data-related function for gridmap.

author:Christian Widmer
author:Cheng Soon Ong
author:Dan Blanchard (dblanchard@ets.org)
gridmap.data.clean_path(path)[source]

Replace all weird SAN paths with normal paths. This is really ETS-specific, but shouldn’t harm anyone else.

gridmap.data.zdumps(obj)[source]

dumps pickleable object into bz2 compressed string :param obj: The object/function to store. :type obj: object or function

gridmap.data.zloads(pickled_data)[source]

loads pickleable object from bz2 compressed string :param pickled_data: BZ2 compressed byte sequence :type pickled_data: bytes

job Module

This module provides wrappers that simplify submission and collection of jobs, in a more ‘pythonic’ fashion.

We use pyZMQ to provide a heart beat feature that allows close monitoring of submitted jobs and take appropriate action in case of failure.

author:Christian Widmer
author:Cheng Soon Ong
author:Dan Blanchard (dblanchard@ets.org)
class gridmap.job.Job(f, args, kwlist=None, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, queue='all.q')[source]

Bases: builtins.object

Central entity that wraps a function and its data. Basically, a job consists of a function, its argument list, its keyword list and a field “ret” which is filled, when the execute method gets called.

Note

This can only be used to wrap picklable functions (i.e., those that are defined at the module or class level).

args
cause_of_death
cleanup
environment
exception
execute(self)[source]

Executes function f with given arguments and writes return value to field ret. If an exception is encountered during execution, ret will contain a pickled version of it. Input data is removed after execution to save space.

function[source]

Function this job will execute.

home_address
jobid
kwlist
log_stderr_fn
log_stdout_fn
mem_free
name
native_specification[source]

define python-style getter

num_resubmits
num_slots
path
queue
ret
timestamp
uniq_id
white_list
working_dir
exception gridmap.job.JobException[source]

Bases: builtins.Exception

New exception type for when one of the jobs crashed.

class gridmap.job.JobMonitor[source]

Bases: builtins.object

Job monitor that communicates with other nodes via 0MQ.

all_jobs_done(self)[source]

checks for all jobs if they are done

check(self, session_id, jobs)[source]

serves input and output data

check_if_alive(self)[source]

check if jobs are alive and determine cause of death if not

gridmap.job.grid_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)[source]

Maps a function onto the cluster.

Note

This can only be used with picklable functions (i.e., those that are defined at the module or class level).

Parameters:
  • f (function) – The function to map on args_list
  • args_list (list) – List of arguments to pass to f
  • cleanup (bool) – Should we remove the stdout and stderr temporary files for each job when we’re done? (They are left in place if there’s an error.)
  • mem_free (str) – Estimate of how much memory each job will need (for scheduling). (Not currently used, because our cluster does not have that setting enabled.)
  • name (str) – Base name to give each job (will have a number add to end)
  • num_slots (int) – Number of slots each job should use.
  • temp_dir (str) – Local temporary directory for storing output for an individual job.
  • white_list (list of str) – If specified, limit nodes used to only those in list.
  • queue (str) – The SGE queue to use for scheduling.
  • quiet (bool) – When true, do not output information about the jobs that have been submitted.
Returns:

List of Job results

gridmap.job.handle_resubmit(session_id, job)[source]

heuristic to determine if the job should be resubmitted

side-effect: job.num_resubmits incremented

gridmap.job.pg_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)[source]

Deprecated since version 0.9: This function has been renamed grid_map.

Parameters:
  • f (function) – The function to map on args_list
  • args_list (list) – List of arguments to pass to f
  • cleanup (bool) – Should we remove the stdout and stderr temporary files for each job when we’re done? (They are left in place if there’s an error.)
  • mem_free (str) – Estimate of how much memory each job will need (for scheduling). (Not currently used, because our cluster does not have that setting enabled.)
  • name (str) – Base name to give each job (will have a number add to end)
  • num_slots (int) – Number of slots each job should use.
  • temp_dir (str) – Local temporary directory for storing output for an individual job.
  • white_list (list of str) – If specified, limit nodes used to only those in list.
  • queue (str) – The SGE queue to use for scheduling.
  • quiet (bool) – When true, do not output information about the jobs that have been submitted.
Returns:

List of Job results

gridmap.job.process_jobs(jobs, temp_dir='/scratch/', white_list=None, quiet=True, max_processes=1, local=False)[source]

Take a list of jobs and process them on the cluster.

Parameters:
  • jobs (list of Job) – Jobs to run.
  • temp_dir (str) – Local temporary directory for storing output for an individual job.
  • white_list (list of str) – If specified, limit nodes used to only those in list.
  • quiet (bool) – When true, do not output information about the jobs that have been submitted.
  • max_processes (int) – The maximum number of concurrent processes to use if processing jobs locally.
  • local (bool) – Should we execute the jobs locally in separate processes instead of on the the cluster?
Returns:

List of Job results

gridmap.job.send_error_mail(job)[source]

send out diagnostic email

runner Module

This module executes pickled jobs on the cluster.

author:Christian Widmer
author:Cheng Soon Ong
author:Dan Blanchard (dblanchard@ets.org)
gridmap.runner.get_cpu_load(pid)[source]
Parameters:pid (int) – Process ID for job whose CPU load we’d like to check.
Returns:CPU usage of process
gridmap.runner.get_job_status(parent_pid)[source]

Determines the status of the current worker and its machine (currently not cross-platform)

Parameters:parent_pid (int) – Process ID for job whose status we’d like to check.
Returns:Memory and CPU load information for given PID.
Return type:dict
gridmap.runner.get_memory_usage(pid)[source]
Parameters:pid (int) – Process ID for job whose memory usage we’d like to check.
Returns:Memory usage of process in Mb.

web Module

Simple web front-end for pythongrid

author:Christian Widmer
author:Cheng Soon Ong
author:Dan Blanchard (dblanchard@ets.org)
class gridmap.web.WebMonitor[source]

Bases: builtins.object

index(self)[source]
job_to_html(job)[source]

display job as html

list_jobs(self, address)[source]

display list of jobs

view_job(self, address, job_id)[source]

display individual job details

Table Of Contents

Previous topic

Installation

This Page