10. parallel

Functions for parallel computations on a single multi core machine using the standard library multiprocessing.

Not the programming details, but the way how to speed up some things.

  • If your computation is already fast (e.g. <1s) go on without parallelisation. In an optimal case you gain a speedup as the number of cpu cores.
  • If you want to use a cluster with all cpus, this is not the way (you need MPI).

Parallelisation is no magic and this module is for convenience for non specialist of parallel computing. The main thing is to pass additional parameters to the processes (a pool of workers) and loop only over one parameter given as list. Opening and closing of the pool is hidden in the function. In this way we can use a multicore machine with all cpus.

During testing I found that shared memory does not really speed up, if we just want to calculate a function e.g. for a list of different Q values dependent on model parameters. Here the pickling of numpy arrays is efficient enough compared to the computation we do. The amount of data pickled should not be too large as each process gets a copy and pickling needs time.

If speed is an issue and shared memory gets important i advice using Fortran with OpenMP as used for ff.cloudScattering with parallel computation and shared memory. For me this was easier than the different solutions around.

We use here only non modified input data and return a new dataset, so we dont need to care about what happens if one process changes the data needed in another process (race conditions,…), anyway its not shared. Please keep this in mind and dont complain if you find a way to modify input data.

For easier debugging (to find the position of an error in the pdb debugger) use the option debug. In this case the multiprocessing is not used and the debugger finds the error correctly.

See example in doForList.


Parallel functions

doForList(funktion, looplist, *args, **kwargs) Calculates function for values in looplist in a pool of workers in parallel usinf multiprocessing.
doForQlist(funktion, qList, *args, **kwargs) Calculates for qlist the function in a pool of workers using multiprocessing.
psphereAverage(funktion[, relError]) Parallel evaluation of spherical average of function.

Helper functions

randomPointsN(NN[, r, skip]) N quasi random points on sphere of radius r based on low-discrepancy sequence.
rphitheta2xyz(RPT) Transformation spherical coordinates [r,phi,theta] to cartesian coordinates [x,y,z]
fibonacciLatticePointsOnSphere(NN[, r]) Fibonacci lattice points on a sphere with radius r (default r=1)
haltonSequence(size, dim) Pseudo random numbers from the Halton sequence

Functions for parallel computations on a single multi core machine using the standard library multiprocessing.

Not the programming details, but the way how to speed up some things.

  • If your computation is already fast (e.g. <1s) go on without parallelisation. In an optimal case you gain a speedup as the number of cpu cores.
  • If you want to use a cluster with all cpus, this is not the way (you need MPI).

Parallelisation is no magic and this module is for convenience for non specialist of parallel computing. The main thing is to pass additional parameters to the processes (a pool of workers) and loop only over one parameter given as list. Opening and closing of the pool is hidden in the function. In this way we can use a multicore machine with all cpus.

During testing I found that shared memory does not really speed up, if we just want to calculate a function e.g. for a list of different Q values dependent on model parameters. Here the pickling of numpy arrays is efficient enough compared to the computation we do. The amount of data pickled should not be too large as each process gets a copy and pickling needs time.

If speed is an issue and shared memory gets important i advice using Fortran with OpenMP as used for ff.cloudScattering with parallel computation and shared memory. For me this was easier than the different solutions around.

We use here only non modified input data and return a new dataset, so we dont need to care about what happens if one process changes the data needed in another process (race conditions,…), anyway its not shared. Please keep this in mind and dont complain if you find a way to modify input data.

For easier debugging (to find the position of an error in the pdb debugger) use the option debug. In this case the multiprocessing is not used and the debugger finds the error correctly.

See example in doForList.

jscatter.parallel.doForList(funktion, looplist, *args, **kwargs)[source]

Calculates function for values in looplist in a pool of workers in parallel usinf multiprocessing.

Like multiprocessing map_async but distributes automatically all given arguments.

Parameters:
funktion : function

Function to process with arguments (looplist[i],args,kwargs) Return value of function should contain parameters or at least the loopover value to allow a check, if desired.

looplist : list

List of values to loop over.

loopover : string, int,default= Not given

Name of argument to use for looping over with values in looplist. If not given the first argument is used, which should be not included as argument.

ncpu : int, optional
Number of cpus in the pool.
  • not given or 0 -> all cpus are used
  • int>0 min (ncpu, mp.cpu_count)
  • int<0 ncpu not to use
cb : None, function

Callback after each calculation.

debug : int

debug > 0 allows serial output for testing

Returns:
list : list of function return values as [result1,result2,…..]

The order of return values is not explicitly synced to looplist.

Notes

The return array of function may be prepended with the value looplist[i] as reference. E.g.:

def f(x,a,b,c,d):
    return [x,x+a+b+c+d]

Examples

def f(x,a,b,c,d):
   res=x+a+b+c+d
   return [x,res]
# loop over first argument, here x
res = js.parallel.doForList(f,looplist=range(100),a=1,b=2,c=3,d=11)
# loop over 'd' ignoring the given d=11 (which can be omitted here)
res = js.parallel.doForList(f,looplist=range(100),loopover='d',x=0,a=1,b=2,c=3,d=11)
jscatter.parallel.doForQlist(funktion, qList, *args, **kwargs)[source]

Calculates for qlist the function in a pool of workers using multiprocessing.

Calcs [function(Qi, *args, **kwargs) for Qi in qlist ] in parallel. The return value of function will contain the value Qi as reference.

Parameters:
funktion : function

Function to process with arguments (looplist[i],args,kwargs)

qList : list

List of values for first argument in function. qList value prepends the arguments args.

ncpu : int, optional
number of cpus in the pool
not given or 0 -> all cpus are used
int>0 min (ncpu, mp.cpu_count)
int<0 ncpu not to use
cb :function, optional

Callback after each calculation

debug : int

debug > 0 allows serial output for testing

Returns:
list : ndim function_return.ndim+1

The list elements will be prepended with the value qlist[i] as reference.

Examples

def f(x,a,b,c,d):
   return [x+a+b+c+d]
# loop over first argument here x
js.parallel.doForList(f,Qlist=range(100),a=1,b=2,c=3,d=11)
jscatter.parallel.fibonacciLatticePointsOnSphere(NN, r=1)[source]

Fibonacci lattice points on a sphere with radius r (default r=1)

This can be used to integrate efficiently over a sphere with well distributed points.

Parameters:
NN : integer

number of points = 2*N+1

r : float, default 1

radius of sphere

Returns:
list of [r,phi,theta] pairs in radians

phi azimuth -pi<phi<pi; theta polar angle 0<theta<pi

References

[1]Measurement of Areas on a Sphere Using Fibonacci and Latitude–Longitude Lattices Á. González Mathematical Geosciences 42, 49-64 (2009)

Examples

import jscatter as js
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
points=js.formel.fibonacciLatticePointsOnSphere(1000)
pp=list(filter(lambda a:(a[1]>0) & (a[1]<np.pi/2) & (a[2]>0) & (a[2]<np.pi/2),points))
pxyz=js.formel.rphitheta2xyz(pp)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(pxyz[:,0],pxyz[:,1],pxyz[:,2],color="k",s=20)
ax.set_xlim([-1,1])
ax.set_ylim([-1,1])
ax.set_zlim([-1,1])
ax.set_aspect("equal")
plt.tight_layout()
plt.show(block=False)

points=js.formel.fibonacciLatticePointsOnSphere(1000)
pp=list(filter(lambda a:(a[2]>0.3) & (a[2]<1) ,points))
v=js.formel.rphitheta2xyz(pp)
R=js.formel.rotationMatrix([1,0,0],np.deg2rad(-30))
pxyz=np.dot(R,v.T).T
#points in polar coordinates
prpt=js.formel.xyz2rphitheta(np.dot(R,pxyz.T).T)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(pxyz[:,0],pxyz[:,1],pxyz[:,2],color="k",s=20)
ax.set_xlim([-1,1])
ax.set_ylim([-1,1])
ax.set_zlim([-1,1])
ax.set_aspect("equal")
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.tight_layout()
plt.show(block=False)
jscatter.parallel.haltonSequence(size, dim)[source]

Pseudo random numbers from the Halton sequence

Parameters:
size : int

Samples from the sequence

dim : int

Dimensions

Returns:
array

References

[1]https://mail.python.org/pipermail/scipy-user/2013-June/034741.html Author: Sebastien Paris, Josef Perktold translation from c
jscatter.parallel.psphereAverage(funktion, relError=300, *args, **kwargs)[source]

Parallel evaluation of spherical average of function.

A Fibonacci lattice or Monte Carlo integration with pseudo random grid is used.

Parameters:
funktion : function

Function to evaluate. Function first argument gets cartesian coordinate [x,y,z] of point on unit sphere.

relError : float, default 300
Determines how points on sphere are selected
  • >1 Fibonacci Lattice with relError*2+1 points
  • 0<1 Pseudo random points on sphere (see randomPointsN).
    Stops if relative improvement in mean is less than relError (uses steps of 40 new points). Final error is (stddev of N points) /sqrt(N) as for Monte Carlo methods even if it is not a correct 1-sigma error in this case.
arg,kwargs :

forwarded to function

Returns:
array like with values from function and appended error

Notes

  • Works also on single core machines.
  • For integration over a continous function as a form factor in scattering the random points are not statistically independent. Think of neigbouring points on an isosurface which are correlated and therefore the standard deviation is biased.. In this case the Fibonacci lattice is the better choice as the standard deviation in a random sample is not a measure of error but more a measure of the differences on the isosurface.

Examples

def f(x,r):
   return [js.formel.xyz2rphitheta(x)[1:].sum()*r]
js.parallel.psphereAverage(f,relError=500,r=1)
jscatter.parallel.randomPointsN(NN, r=1, skip=0)[source]

N quasi random points on sphere of radius r based on low-discrepancy sequence.

For numerical integration quasi random numbers are better than random samples as the error drops faster [1]. Here we use the Halton sequence to generate the sequence. Skipping points makes the sequence additive and does not repeat points.

Parameters:
NN : int

Number of points to generate.

r : float

Radius of sphere

skip : int

Number of points to skip in Halton sequence .

Returns:
array of [r,phi,theta] pairs in radians

References

[1](1, 2) https://en.wikipedia.org/wiki/Low-discrepancy_sequence

Examples

import jscatter as js
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for i,color in enumerate(['b','g','r','y']):
   points=js.parallel.randomPointsN(400,skip=400*i)
   points=list(filter(lambda a:(a[1]>0) ,points))
   pxyz=js.formel.rphitheta2xyz(points)
   ax.scatter(pxyz[:,0],pxyz[:,1],pxyz[:,2],color=color,s=20)
ax.set_xlim([-1,1])
ax.set_ylim([-1,1])
ax.set_zlim([-1,1])
ax.set_aspect("equal")
plt.tight_layout()
plt.show(block=False)
jscatter.parallel.rphitheta2xyz(RPT)[source]

Transformation spherical coordinates [r,phi,theta] to cartesian coordinates [x,y,z]

Parameters:
RPT : array Nx3
dim Nx3 with [r,phi,theta] coordinates
r : float length
phi : float azimuth -pi < phi < pi
theta : float polar angle 0 < theta < pi
Returns:
Array with same dimmension as RPT.