gridmap Package
The most useful parts of our API are available at the package level in addition
to the module level. They are documented in both places for convenience.
From job Module
-
class gridmap.Job(f, args, kwlist=None, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, queue='all.q')
Bases: builtins.object
Central entity that wraps a function and its data. Basically, a job consists
of a function, its argument list, its keyword list and a field “ret” which
is filled, when the execute method gets called.
Note
This can only be used to wrap picklable functions (i.e., those that
are defined at the module or class level).
-
args
-
cause_of_death
-
cleanup
-
environment
-
exception
-
execute(self)
Executes function f with given arguments
and writes return value to field ret.
If an exception is encountered during execution, ret will
contain a pickled version of it.
Input data is removed after execution to save space.
-
function
Function this job will execute.
-
home_address
-
jobid
-
kwlist
-
log_stderr_fn
-
log_stdout_fn
-
mem_free
-
name
-
native_specification
define python-style getter
-
num_resubmits
-
num_slots
-
path
-
queue
-
ret
-
timestamp
-
uniq_id
-
white_list
-
working_dir
-
exception gridmap.JobException
New exception type for when one of the jobs crashed.
-
gridmap.process_jobs(jobs, temp_dir='/scratch/', white_list=None, quiet=True, max_processes=1, local=False)
Take a list of jobs and process them on the cluster.
Parameters: |
- jobs (list of Job) – Jobs to run.
- temp_dir (str) – Local temporary directory for storing output for an
individual job.
- white_list (list of str) – If specified, limit nodes used to only those in list.
- quiet (bool) – When true, do not output information about the jobs that have
been submitted.
- max_processes (int) – The maximum number of concurrent processes to use if
processing jobs locally.
- local (bool) – Should we execute the jobs locally in separate processes
instead of on the the cluster?
|
Returns: | List of Job results
|
-
gridmap.grid_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)
Maps a function onto the cluster.
Note
This can only be used with picklable functions (i.e., those that are
defined at the module or class level).
Parameters: |
- f (function) – The function to map on args_list
- args_list (list) – List of arguments to pass to f
- cleanup (bool) – Should we remove the stdout and stderr temporary files for
each job when we’re done? (They are left in place if there’s
an error.)
- mem_free (str) – Estimate of how much memory each job will need (for
scheduling). (Not currently used, because our cluster does
not have that setting enabled.)
- name (str) – Base name to give each job (will have a number add to end)
- num_slots (int) – Number of slots each job should use.
- temp_dir (str) – Local temporary directory for storing output for an
individual job.
- white_list (list of str) – If specified, limit nodes used to only those in list.
- queue (str) – The SGE queue to use for scheduling.
- quiet (bool) – When true, do not output information about the jobs that have
been submitted.
|
Returns: | List of Job results
|
-
gridmap.pg_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)
Deprecated since version 0.9: This function has been renamed grid_map.
Parameters: |
- f (function) – The function to map on args_list
- args_list (list) – List of arguments to pass to f
- cleanup (bool) – Should we remove the stdout and stderr temporary files for
each job when we’re done? (They are left in place if there’s
an error.)
- mem_free (str) – Estimate of how much memory each job will need (for
scheduling). (Not currently used, because our cluster does
not have that setting enabled.)
- name (str) – Base name to give each job (will have a number add to end)
- num_slots (int) – Number of slots each job should use.
- temp_dir (str) – Local temporary directory for storing output for an
individual job.
- white_list (list of str) – If specified, limit nodes used to only those in list.
- queue (str) – The SGE queue to use for scheduling.
- quiet (bool) – When true, do not output information about the jobs that have
been submitted.
|
Returns: | List of Job results
|
conf Module
Global settings for GridMap. All of these settings can be overridden by
specifying environment variables with the same name.
author: | Christian Widmer |
author: | Cheng Soon Ong |
author: | Dan Blanchard (dblanchard@ets.org) |
var USE_MEM_FREE: |
| Does your cluster support specifying how much memory a job
will use via mem_free? (Default: False) |
var DEFAULT_QUEUE: |
| The default job scheduling queue to use.
(Default: all.q) |
var CREATE_PLOTS: |
| Should we plot cpu and mem usage and send via email?
(Default: True) |
var USE_CHERRYPY: |
| Should we start web monitoring interface?
(Default: True) |
var SEND_ERROR_MAILS: |
| Should we send error emails?
(Default: False) |
var SMTP_SERVER: |
| SMTP server for sending error emails. |
var ERROR_MAIL_SENDER: |
| Sender address to use for error emails.
(Default: error@gridmap.py) |
var ERROR_MAIL_RECIPIENT: |
| Recipient address for error emails.
(Default: $USER@$HOST, where $USER is the current
user’s username, and $HOST is the last two sections
of the server’s fully qualified domain name, or just
the host’s name if it does not contain periods.) |
var MAX_MSG_LENGTH: |
| Maximum length of any error email message.
(Default: 5000) |
var MAX_TIME_BETWEEN_HEARTBEATS: |
| How long should we wait (in seconds) for a
heartbeat before we consider a job dead?
(Default: 45) |
var NUM_RESUBMITS: |
| How many times can a particular job can die, before we give
up. (Default: 3) |
var CHECK_FREQUENCY: |
| How many seconds pass before we check on the status of a
particular job in seconds. (Default: 15) |
var HEARTBEAT_FREQUENCY: |
| How many seconds pass before jobs on the cluster send
back heart beats to the submission host.
(Default: 10) |
var WEB_PORT: | Port to use for CherryPy server when using web monitor.
(Default: 8076) |
data Module
This modules provides all of the data-related function for gridmap.
-
gridmap.data.clean_path(path)[source]
Replace all weird SAN paths with normal paths. This is really
ETS-specific, but shouldn’t harm anyone else.
-
gridmap.data.zdumps(obj)[source]
dumps pickleable object into bz2 compressed string
:param obj: The object/function to store.
:type obj: object or function
-
gridmap.data.zloads(pickled_data)[source]
loads pickleable object from bz2 compressed string
:param pickled_data: BZ2 compressed byte sequence
:type pickled_data: bytes
job Module
This module provides wrappers that simplify submission and collection of jobs,
in a more ‘pythonic’ fashion.
We use pyZMQ to provide a heart beat feature that allows close monitoring
of submitted jobs and take appropriate action in case of failure.
-
class gridmap.job.Job(f, args, kwlist=None, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, queue='all.q')[source]
Bases: builtins.object
Central entity that wraps a function and its data. Basically, a job consists
of a function, its argument list, its keyword list and a field “ret” which
is filled, when the execute method gets called.
Note
This can only be used to wrap picklable functions (i.e., those that
are defined at the module or class level).
-
args
-
cause_of_death
-
cleanup
-
environment
-
exception
-
execute(self)[source]
Executes function f with given arguments
and writes return value to field ret.
If an exception is encountered during execution, ret will
contain a pickled version of it.
Input data is removed after execution to save space.
-
function[source]
Function this job will execute.
-
home_address
-
jobid
-
kwlist
-
log_stderr_fn
-
log_stdout_fn
-
mem_free
-
name
-
native_specification[source]
define python-style getter
-
num_resubmits
-
num_slots
-
path
-
queue
-
ret
-
timestamp
-
uniq_id
-
white_list
-
working_dir
-
exception gridmap.job.JobException[source]
Bases: builtins.Exception
New exception type for when one of the jobs crashed.
-
class gridmap.job.JobMonitor[source]
Bases: builtins.object
Job monitor that communicates with other nodes via 0MQ.
-
all_jobs_done(self)[source]
checks for all jobs if they are done
-
check(self, session_id, jobs)[source]
serves input and output data
-
check_if_alive(self)[source]
check if jobs are alive and determine cause of death if not
-
gridmap.job.grid_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)[source]
Maps a function onto the cluster.
Note
This can only be used with picklable functions (i.e., those that are
defined at the module or class level).
Parameters: |
- f (function) – The function to map on args_list
- args_list (list) – List of arguments to pass to f
- cleanup (bool) – Should we remove the stdout and stderr temporary files for
each job when we’re done? (They are left in place if there’s
an error.)
- mem_free (str) – Estimate of how much memory each job will need (for
scheduling). (Not currently used, because our cluster does
not have that setting enabled.)
- name (str) – Base name to give each job (will have a number add to end)
- num_slots (int) – Number of slots each job should use.
- temp_dir (str) – Local temporary directory for storing output for an
individual job.
- white_list (list of str) – If specified, limit nodes used to only those in list.
- queue (str) – The SGE queue to use for scheduling.
- quiet (bool) – When true, do not output information about the jobs that have
been submitted.
|
Returns: | List of Job results
|
-
gridmap.job.handle_resubmit(session_id, job)[source]
heuristic to determine if the job should be resubmitted
side-effect:
job.num_resubmits incremented
-
gridmap.job.pg_map(f, args_list, cleanup=True, mem_free='1G', name='gridmap_job', num_slots=1, temp_dir='/scratch/', white_list=None, queue='all.q', quiet=True)[source]
Deprecated since version 0.9: This function has been renamed grid_map.
Parameters: |
- f (function) – The function to map on args_list
- args_list (list) – List of arguments to pass to f
- cleanup (bool) – Should we remove the stdout and stderr temporary files for
each job when we’re done? (They are left in place if there’s
an error.)
- mem_free (str) – Estimate of how much memory each job will need (for
scheduling). (Not currently used, because our cluster does
not have that setting enabled.)
- name (str) – Base name to give each job (will have a number add to end)
- num_slots (int) – Number of slots each job should use.
- temp_dir (str) – Local temporary directory for storing output for an
individual job.
- white_list (list of str) – If specified, limit nodes used to only those in list.
- queue (str) – The SGE queue to use for scheduling.
- quiet (bool) – When true, do not output information about the jobs that have
been submitted.
|
Returns: | List of Job results
|
-
gridmap.job.process_jobs(jobs, temp_dir='/scratch/', white_list=None, quiet=True, max_processes=1, local=False)[source]
Take a list of jobs and process them on the cluster.
Parameters: |
- jobs (list of Job) – Jobs to run.
- temp_dir (str) – Local temporary directory for storing output for an
individual job.
- white_list (list of str) – If specified, limit nodes used to only those in list.
- quiet (bool) – When true, do not output information about the jobs that have
been submitted.
- max_processes (int) – The maximum number of concurrent processes to use if
processing jobs locally.
- local (bool) – Should we execute the jobs locally in separate processes
instead of on the the cluster?
|
Returns: | List of Job results
|
-
gridmap.job.send_error_mail(job)[source]
send out diagnostic email
runner Module
This module executes pickled jobs on the cluster.
-
gridmap.runner.get_cpu_load(pid)[source]
Parameters: | pid (int) – Process ID for job whose CPU load we’d like to check. |
Returns: | CPU usage of process |
-
gridmap.runner.get_job_status(parent_pid)[source]
Determines the status of the current worker and its machine (currently not
cross-platform)
Parameters: | parent_pid (int) – Process ID for job whose status we’d like to check. |
Returns: | Memory and CPU load information for given PID. |
Return type: | dict |
-
gridmap.runner.get_memory_usage(pid)[source]
Parameters: | pid (int) – Process ID for job whose memory usage we’d like to check. |
Returns: | Memory usage of process in Mb. |
web Module
Simple web front-end for pythongrid
-
class gridmap.web.WebMonitor[source]
Bases: builtins.object
-
index(self)[source]
-
job_to_html(job)[source]
display job as html
-
list_jobs(self, address)[source]
display list of jobs
-
view_job(self, address, job_id)[source]
display individual job details