Distributed python

From Vision Wiki
Revision as of 01:20, 6 July 2012 by Eyrun (talk | contribs) (rvv)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This tutorial will get you started on how to use IPython's parallel features for parallel and distributed tasks in the cluster.

(I couldn't get it to work across different machines for now. Please edit this page if you get it to work)

MPI

Make sure you have access to the mpiexec command in the cluster. If not, run the mpi-selector-menu command and choose a default MPI distribution.

Create an IPython MPI profile

To do this, run the following command on a cluster machine

 $ ipython profile create --parallel --profile=mpi

This will create an ipython profile called 'mpi'. The configuration files will be in the folder ~/.ipython/profile_mpi/

In the file ~/.ipython/profile_mpi/ipcluster_config.py, find the line that sets the c.IPClusterEngines.engine_launcher_class parameter, uncomment it and set it to MPIExecEngineSetLauncher:

 c.IPClusterEngines.engine_launcher_class = 'MPIExecEngineSetLauncher'

Launch the engines

To run the workers using the mpi profile you just created:

 ipcluster start --n=12 --profile=mpi

(the --n option specifies the number of engines you want to run)

More info here

Using DirectView to run code on the engines

Example:

 from IPython.parallel import Client
 rc = Client(profile='mpi')
 dview = rc[:] # get access to all the engines
 # execute code on each engine
 dview.execute('import random; a = random.random()')
 # execute script
 dview.run('my_script.py')
 # get and set variables
 dview['a']
 # map a function using all engines concurrently
 powers = dview.map_sync(lambda x: x**10, range(32))

Note that ipython uses pickle to serialize objects in order get and set remote variables (Numpy arrays are treated differently for efficiency)

More on DirectView here

See the documentation for more.