Difference between revisions of "OpenMPI"

From relax wiki
Jump to navigation Jump to search
m (→‎mpi4py OpenMPI: Link fix.)
 
(125 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== OpenMPI / mpi4py ==
+
Open MPI is a Message Passing Interface (MPI) used in relax for parallalised calculations.
[http://www.nmr-relax.com/manual/Usage_multi_processor.html Manual on Multi processor usage]
 
  
If you have OpenMPI and mpi4py installed, then you have access to Gary
+
== mpi4py OpenMPI ==
Thompson's multi-processor framework for MPI parallelisation.  
+
This package provides Python bindings for the Message Passing Interface (MPI) standard. It is implemented on top of the MPI-1/2/3 specification and exposes an API which grounds on the standard MPI-2 C++ bindings.
  
However the code in relax must be written to support this. This is the case
+
[http://www.nmr-relax.com/manual/Usage_multi_processor.html relax manual on Multi processor usage]
for the model-free analysis, in which case Gary has achieved near
 
perfect scaling efficiency:
 
  
https://mail.gna.org/public/relax-devel/2007-05/msg00000.html
+
If you have OpenMPI and mpi4py installed, then you have access to Gary Thompson's multi-processor framework for MPI parallelisation.
  
For the relaxation dispersion branch, no parallelisation has
+
Gary has achieved near perfect scaling efficiency:
been attempted, neither in the original code from Sebastian Morin or
 
the recent modifications by myself.  This is not a simple task and
 
will take a lot of effort to implement.  If this is to be implemented
 
one day, it is suggested parallelising at the level of the spin
 
clusters.
 
  
It is often quite hard to achieve good scaling
+
{{gna mailing list url|relax-devel/2007-05/msg00000.html}}
efficiency and often the first attempts will just make the code
 
slower, even on a 1024 node cluster, due to the bottleneck of data
 
transfer between the nodes.  
 
  
The parallelisation will also require 10
+
=== Dependencies ===
times as much code to be written to do the same thing as
+
# Python 2.4 to 2.7 or 3.0 to 3.4, or a recent PyPy release.
non-parallised code, and debugging is much more difficult.
+
# A functional MPI 1.x/2.x/3.x implementation like MPICH or Open MPI built with shared/dynamic libraries.
  
== Update 2013/09/11 ==
+
=== How does it work ? ===
 +
 
 +
If mpirun is started with "np 4", relax will get 4 "rank" processors. <br>
 +
relax will organize 1 processor as "organizer" which sends jobs to 3 slaves, and receive and organize the returned results.
 +
 
 +
relax will for the main part of the code '''only use 1 processor'''. <br>
 +
Only the computational heavy parts of relax are prepared for multi processing. <br>
 +
And the computational heavy part is '''minimising''' to find optimal parameters. <br>
 +
All other parts of relax are "light-weight", which only read/writes text files, or organize data.
 +
 
 +
For minimization, relax collects all data and functions organized into classes, and pack this as independent job to be send to a slave for minimizing.<br>
 +
To pack a job, the job needs to have sufficient information to "live an independent life". The data to minimize, the functions to use, how treat the result, and send it return to the master. <br>
 +
The master server asks this:
 +
"Slave, find the minimization of this spin cluster, and return the results to me when you are done, then you get a new job".
 +
 
 +
For example 10 slaves, this makes it possible to
 +
# Simultaneously minimise 10 free spins.
 +
# Simultaneously make 10 Monte-Carlo simulations
 +
 
 +
=== Install OpenMPI on linux and set environments ===
 +
See https://www10.informatik.uni-erlangen.de/Cluster/
 +
 
 +
<source lang="bash">
 +
# Install openmpi-devel, to get 'mpicc'
 +
sudo yum install openmpi-devel
 +
 
 +
# Check for mpicc
 +
which mpicc
 +
 
 +
# If not found set environments by loading module
 +
# See avail
 +
module avail
 +
 
 +
# Show what loading does
 +
module show openmpi-x86_64
 +
# or
 +
module show openmpi-1.10-x86_64
 +
 
 +
# See if anything is loaded
 +
module list
 +
 
 +
# Load
 +
module load openmpi-x86_64
 +
# Or
 +
module load openmpi-1.10-x86_64
 +
 
 +
# See list
 +
module list
 +
 
 +
# Check for mpicc, mpirun or mpiexec
 +
which mpicc
 +
which mpirun
 +
which mpiexec
 +
 
 +
# Unload
 +
module unload openmpi-x86_64
 +
</source>
 +
 
 +
In .cshrc file, one could put
 +
<source lang="bash">
 +
# Open MPI: Open Source High Performance Computing
 +
foreach x (tomat bax minima elvis)
 +
    if ( $HOST == $x) then
 +
        module load openmpi-x86_64
 +
    endif
 +
end
 +
</source>
 +
 
 +
If not found, try this fix, ref: http://forums.fedoraforum.org/showthread.php?t=194688
 +
<source lang="bash">
 +
#For 32 computer.
 +
sudo ln -s /usr/lib/openmpi/bin/mpicc /usr/bin/mpicc
 +
# For 64 bit computer.
 +
sudo ln -s /usr/lib64/openmpi/bin/mpicc /usr/bin/mpicc
 +
# or
 +
sudo ln -s  /usr/lib64/openmpi-1.10/bin/mpicc /usr/bin/mpicc
 +
 
 +
</source>
 +
 
 +
== Install mpi4py ==
 +
 
 +
=== Linux and Mac ===
 +
Remember to check, if there are newer versions of [https://bitbucket.org/mpi4py/mpi4py/downloads/ mpi4py]. <br>
 +
The [https://bitbucket.org/mpi4py/mpi4py mpi4py] library can be installed on all UNIX systems by typing:
 +
{{#tag:source|
 +
# Change to bash, if in tcsh shell
 +
#bash
 +
v={{current version mpi4py}}
 +
#tcsh
 +
set v={{current version mpi4py}}
 +
 
 +
pip install https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz
 +
pip install https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz --upgrade
 +
 
 +
# Or to use another python interpreter than standard
 +
wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz
 +
tar -xzf mpi4py-$v.tar.gz
 +
rm mpi4py-$v.tar.gz
 +
cd mpi4py-$v
 +
# Use the path to the python to build with
 +
python setup.py build
 +
python setup.py install
 +
cd ..
 +
rm -rf mpi4py-$v
 +
|lang="bash"
 +
}}
 +
 
 +
Then test
 +
<source lang="python">
 +
python
 +
import mpi4py
 +
mpi4py.__file__
 +
</source>
 +
 
 +
== Relax In multiprocessor mode ==
 +
 
 +
'''How many processors should I start?''' <br>
 +
You should start as many cores as you have. '''But not counting in threads'''. <br>
 +
In this example, you can start 12 (6*2), where Relax will take 1 for receiving results, and 11 for calculations.
 +
<source lang="bash">
 +
lscpu
 +
----
 +
CPU(s):                24
 +
On-line CPU(s) list:  0-23
 +
Thread(s) per core:    2
 +
Core(s) per socket:    6
 +
Socket(s):            2
 +
</source>
 +
 
 +
You can continue try this, until a good results
 +
<source lang="bash">
 +
# With shell
 +
mpirun -np 4 echo "hello world"
 +
 
 +
# With python
 +
mpirun -np 4 python -m mpi4py helloworld
 +
 
 +
# If newer version of mpirun, then --report-bindings works
 +
mpirun --report-bindings -np 4 echo "hello world"
 +
mpirun --report-bindings -np 12 echo "hello world"
 +
# This is too much
 +
mpirun --report-bindings -np 13 echo "hello world"
 +
</source>
 +
 
 +
Script for ex
 +
<source lang="bash">
 +
tcsh
 +
set RELAX=`which relax`
 +
 
 +
# Normal
 +
mpirun -np 12 $RELAX --multi='mpi4py'
 +
 
 +
# In gui
 +
mpirun -np 12 $RELAX --multi='mpi4py' -g
 +
</source>
 +
 
 +
where N is the number of slaves you have.  See the mpirun documentation for details - this is not part of relax.  <br>
 +
This code runs in the GUI, the script UI and the prompt UI, i.e. everywhere.
 +
 
 +
=== Helper start scripts ===
 +
If you have several versions or development branches of relax installed, you could probably use some of these scripts, and put them in your PATH.
 +
 
 +
==== Script for force running relax on server computer ====
 +
This script exemplifies a setup, where the above installation requirements is met on one server computer ''haddock'', and where satellite computers are forced to run on this computer.
 +
 
 +
The file '''relax_trunk''' is made executable (''chmod +x relax_trunk''), and put in a PATH, known by all satellite computers.
 +
<source lang="bash">
 +
#!/bin/tcsh -f
 +
 
 +
# Set the lax version used for this script.
 +
set RELAX=/network_drive/software_user/software/NMR-relax/relax_trunk/relax
 +
 
 +
# Check machine, since only machine haddock have correct packages installed.
 +
if ( $HOST != "haddock") then
 +
        echo "You have to run on haddock. I do it for you"
 +
        ssh haddock -Y -t "cd $PWD; $RELAX $argv; /bin/tcsh"
 +
else
 +
        $RELAX $argv
 +
endif
 +
</source>
 +
 
 +
==== Script for running relax with maximum number of processors available ====
 +
This script exemplifies a setup, to test the running relax with maximum number of processors.
 +
 
 +
The file '''relax_test''' is made executable, and put in a PATH, known by all satellite computers.
 +
<source lang="bash">
 +
#!/bin/tcsh -fe
 +
 
 +
# Set the relax version used for this script.
 +
set RELAX=/sbinlab2/tlinnet/software/NMR-relax/relax_trunk/relax
 +
 
 +
# Set number of available CPUs.
 +
set NPROC=`nproc`
 +
set NP=`echo $NPROC + 0 | bc `
 +
 
 +
echo "Running relax with NP=$NP in multi-processor mode"
 +
 
 +
# Run relax in multi processor mode.
 +
mpirun -np $NP $RELAX --multi='mpi4py' $argv
 +
</source>
 +
 
 +
==== Script for force running relax on server computer with openmpi ====
 +
<source lang="bash">
 +
#!/bin/tcsh
 +
 
 +
# Set the lax version used for this script.
 +
set RELAX=/sbinlab2/software/NMR-relax/relax_trunk/relax
 +
 
 +
# Set number of available CPUs.
 +
#set NPROC=`nproc`
 +
set NPROC=10
 +
set NP=`echo $NPROC + 0 | bc `
 +
 
 +
# Run relax in multi processor mode.
 +
set RELAXRUN="mpirun -np $NP $RELAX --multi='mpi4py' $argv"
 +
 
 +
# Check machine, since only machine haddock have openmpi-devel installed
 +
if ( $HOST != "haddock") then
 +
    echo "You have to run on haddock. I do it for you"
 +
    ssh haddock -Y -t "cd $PWD; $RELAXRUN; /bin/tcsh"
 +
else
 +
    mpirun -np $NP $RELAX --multi='mpi4py' $argv
 +
endif
 +
</source>
 +
 
 +
== Setting up relax on super computer Beagle2 ==
 +
 
 +
Please see post from Lora Picton [[relax_on_Beagle2]]
 +
 
 +
Message: http://thread.gmane.org/gmane.science.nmr.relax.user/1821
 +
 
 +
== Commands and FAQ about mpirun ==
 +
See oracles page on mpirun and the manual openmpi:
 +
# https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingPrograms.html
 +
# http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php
 +
 
 +
For a simple SPMD (Single Process, Multiple Data) job, the typical syntax is:
 +
<source lang="bash">
 +
mpirun -np x program-name
 +
</source>
 +
 
 +
=== Find number of Socket, Cores and Threads ===
 +
See http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options
 +
 
 +
<source lang="bash">
 +
lscpu | egrep -e "CPU|Thread|Core|Socket"
 +
</source>
 +
<source lang="text">
 +
--- tomat
 +
CPU(s):                4
 +
On-line CPU(s) list:  0-3
 +
Thread(s) per core:    1
 +
Core(s) per socket:    4
 +
Socket(s):            1
 +
CPU family:            6
 +
CPU MHz:              1600.000
 +
NUMA node0 CPU(s):    0-3
 +
--- Machine haddock
 +
CPU(s):                24
 +
On-line CPU(s) list:  0-23
 +
Thread(s) per core:    2
 +
Core(s) per socket:    6
 +
Socket(s):            2
 +
CPU family:            6
 +
CPU MHz:              2394.135
 +
NUMA node0 CPU(s):    0,2,4,6,8,10,12,14,16,18,20,22
 +
NUMA node1 CPU(s):    1,3,5,7,9,11,13,15,17,19,21,23
 +
</source>
 +
 
 +
=== Test binding to socket ===
 +
<source lang="bash">
 +
module load openmpi-x86_64
 +
</source>
 +
 
 +
Output from a machine with: Thread(s) per core: 1, Core(s) per socket: 4, Socket(s): 1
 +
<source lang="text">
 +
mpirun --report-bindings -np 4 relax --multi='mpi4py'
 +
[tomat:28223] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/.]
 +
[tomat:28223] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B]
 +
[tomat:28223] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
 +
[tomat:28223] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]
 +
</source>
 +
 
 +
Output when running with to many processed from a machine with: Thread(s) per core: 1, Core(s) per socket: 4, Socket(s): 1
 +
<source lang="text">
 +
mpirun --report-bindings -np 5 relax --multi='mpi4py'
 +
[tomat:31434] MCW rank 0 is not bound (or bound to all available processors)
 +
[tomat:31434] MCW rank 1 is not bound (or bound to all available processors)
 +
[tomat:31434] MCW rank 2 is not bound (or bound to all available processors)
 +
[tomat:31434] MCW rank 3 is not bound (or bound to all available processors)
 +
[tomat:31434] MCW rank 4 is not bound (or bound to all available processors)
 +
</source>
 +
 
 +
Output from a machine with: Thread(s) per core: 2, Core(s) per socket: 6, Socket(s): 2
 +
<source lang="text">
 +
mpirun --report-bindings -np 11 relax --multi='mpi4py'
 +
[haddock:31110] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../..][../../../../../..]
 +
[haddock:31110] MCW rank 7 bound to socket 1[core 9[hwt 0-1]]: [../../../../../..][../../../BB/../..]
 +
[haddock:31110] MCW rank 8 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/..][../../../../../..]
 +
[haddock:31110] MCW rank 9 bound to socket 1[core 10[hwt 0-1]]: [../../../../../..][../../../../BB/..]
 +
[haddock:31110] MCW rank 10 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB][../../../../../..]
 +
[haddock:31110] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../..][../../../../../..]
 +
[haddock:31110] MCW rank 1 bound to socket 1[core 6[hwt 0-1]]: [../../../../../..][BB/../../../../..]
 +
[haddock:31110] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..][../../../../../..]
 +
[haddock:31110] MCW rank 3 bound to socket 1[core 7[hwt 0-1]]: [../../../../../..][../BB/../../../..]
 +
[haddock:31110] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../..][../../../../../..]
 +
[haddock:31110] MCW rank 5 bound to socket 1[core 8[hwt 0-1]]: [../../../../../..][../../BB/../../..]
 +
</source>
 +
 
 +
== Use mpirun with ssh hostfile ==
 +
 
 +
{{caution|This is test only. This appears not to function well!}}
 +
 
 +
See
 +
# https://www.open-mpi.org/faq/?category=running#mpirun-hostfile
 +
# http://mirror.its.dal.ca/openmpi/faq/?category=running#simple-spmd-run
 +
# https://www.open-mpi.org/faq/?category=rsh
 +
# https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingBatchPrograms.html
 +
 
 +
We have the 3 machines '''bax minima elvis'''.<br>
 +
 
 +
Let's try to make a hostfile and use them at the same time
 +
 
 +
<source lang="bash">
 +
set MPIHARR = (bax minima elvis)
 +
foreach MPIH ($MPIHARR)
 +
ssh $MPIH 'echo $HOST; lscpu | egrep -e "Thread|Core|Socket"; module list'
 +
echo ""
 +
end
 +
</source>
 +
Output
 +
<source lang="text">
 +
bax
 +
Thread(s) per core:    1
 +
Core(s) per socket:    4
 +
Socket(s):            1
 +
Currently Loaded Modulefiles:
 +
  1) openmpi-x86_64
 +
 
 +
minima
 +
Thread(s) per core:    1
 +
Core(s) per socket:    4
 +
Socket(s):            1
 +
Currently Loaded Modulefiles:
 +
  1) openmpi-x86_64
 +
 
 +
elvis
 +
Thread(s) per core:    1
 +
Core(s) per socket:    4
 +
Socket(s):            1
 +
Currently Loaded Modulefiles:
 +
  1) openmpi-x86_64
 +
</source>
 +
 
 +
The node machines is a quad-processor machine, and we want to reserve 1 cpu for the user at the machine.
 +
 
 +
Make a host file, Currently, I cannot get more than 2 ssh to work at the same time.
 +
<source lang="bash">
 +
cat << EOF > relax_hosts
 +
localhost slots=3 max-slots=4
 +
bax slots=3 max-slots=4
 +
minima slots=3 max-slots=4
 +
#elvis slots=3 max-slots=4
 +
EOF
 +
 
 +
cat relax_hosts
 +
</source>
 +
 
 +
Then try to run some tests
 +
 
 +
<source lang="bash">
 +
# Check first environments
 +
ssh localhost env | grep -i path
 +
ssh bax env | grep -i path
 +
 
 +
mpirun --host localhost hostname
 +
mpirun --mca plm_base_verbose 10 --host localhost hostname
 +
 
 +
# On another machine, this will not work because of the firewall
 +
mpirun --host bax hostname
 +
 
 +
# Verbose for bax
 +
mpirun --mca plm_base_verbose 10 --host bax hostname
 +
mpirun --mca rml_base_verbose 10  --host bax hostname
 +
 
 +
# This shows that TCP is having problems: tcp_peer_complete_connect: connection failed with error 113
 +
mpirun --mca oob_base_verbose 10 --host bax hostname
 +
# Shutdown firewall
 +
sudo iptables -L -n
 +
sudo service iptables stop
 +
sudo iptables -L -n
 +
# Try again
 +
mpirun --mca oob_base_verbose 10 --host bax hostname
 +
mpirun --host bax hostname
 +
 
 +
# Now try
 +
mpirun --host localhost,bax hostname
 +
mpirun --host localhost,bax,minima hostname
 +
mpirun --host localhost,bax,elvis hostname
 +
 
 +
# Test why 4 machines not working
 +
mpirun --mca plm_base_verbose 10 --host localhost,bax,elvis,minima hostname
 +
mpirun --mca rml_base_verbose 10 --host localhost,bax,elvis,minima hostname
 +
mpirun --mca oob_base_verbose 10 --host localhost,bax,elvis,minima hostname
 +
 
 +
# Try just 3 machines
 +
mpirun --report-bindings --hostfile relax_hosts hostname
 +
mpirun --report-bindings -np 9 --hostfile relax_hosts hostname
 +
mpirun --report-bindings -np 9 --hostfile relax_hosts uptime
 +
 
 +
# Now try relax
 +
mpirun --report-bindings --hostfile relax_hosts relax --multi='mpi4py'
 +
</source>
 +
 
 +
== Running Parallel Jobs with queue system ==
 +
See:
 +
# https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingBatchPrograms.html
 +
=== Running Parallel Jobs in the Sun Grid Engine Environment ===
 +
See
 +
# https://www.open-mpi.org/faq/?category=building#build-rte-sge
 +
# http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslSge
 +
 
 +
Test if you have it
 +
<source lang="bash">
 +
ompi_info | grep gridengine
 +
</source>
 +
 
 +
=== Running Parallel Jobs in the PBS/Torque ===
 +
See
 +
# https://www.open-mpi.org/faq/?category=building#build-rte-tm
 +
# http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.hpcrh/buildtorque.htm
 +
# http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.hpcrh/installmn.htm%23installingthemanagementnode?lang=en
 +
Test if you have it
 +
<source lang="bash">
 +
ompi_info | grep tm
 +
</source>
 +
 
 +
=== Running Parallel Jobs in SLURM ===
 +
See
 +
# https://www.open-mpi.org/faq/?category=building#build-rte-sge
 +
# https://www.open-mpi.org/faq/?category=slurm
 +
 
 +
Test if you have it
 +
<source lang="bash">
 +
ompi_info | grep slurm
 +
</source>
 +
 
 +
== Updates ==
 +
 
 +
=== Update 2013/09/11 ===
 
See [http://thread.gmane.org/gmane.science.nmr.relax.scm Commit]
 
See [http://thread.gmane.org/gmane.science.nmr.relax.scm Commit]
  
Line 38: Line 476:
 
Because the parallelisation is at the cluster level there are some situations, whereby instead of
 
Because the parallelisation is at the cluster level there are some situations, whereby instead of
 
optimisation being faster when running on multiple slaves, the optimisation will be slower. <br>
 
optimisation being faster when running on multiple slaves, the optimisation will be slower. <br>
This is the case when all spins being studied in clustered into a small number of clusters. <br>
+
'''This is the case''' when '''all spins''' being studied is clustered into a '''small number''' of clusters. Example 100 spins into 1 cluster.<br>
 
It is also likely to be slower for the minimise user function when no clustering is defined, due to the
 
It is also likely to be slower for the minimise user function when no clustering is defined, due to the
 
overhead costs of data transfer (but for the numeric models, in this case there will be a clear win).
 
overhead costs of data transfer (but for the numeric models, in this case there will be a clear win).
  
The two situations where there will be a huge performance win is the grid_search user function when
+
The two situations where there will be a '''huge performance win'''' is the '''grid_search''' user function when
no clustering is defined and the Monte Carlo simulations for error analysis.
+
'''no clustering''' is defined and the Monte Carlo simulations for error analysis.
  
== Test of speed ==
+
=== Test of speed ===
  
=== Performed tests ===
+
==== Performed tests ====
  
==== A - Relax_disp systemtest ====
+
===== A - Relax_disp systemtest =====
 +
'''Relax_disp_systemtest'''
 
<source lang="bash">
 
<source lang="bash">
set LOG=single.log ;  
+
#!/bin/tcsh
relax_single --time -s Relax_disp -t $LOG ;  
+
set LOG=single.log ;
 +
relax_single --time -s Relax_disp -t $LOG ;
 
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
 
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
 
echo $RUNTIME >> $LOG ;
 
echo $RUNTIME >> $LOG ;
echo $RUNTIME
+
echo $RUNTIME ;
# Was between 95-105 seconds
 
  
set LOG=multi.log ;  
+
set LOG=multi.log ;
relax_multi --time -s Relax_disp -t $LOG ;  
+
relax_multi --time -s Relax_disp -t $LOG ;
 
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
 
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
 
echo $RUNTIME >> $LOG ;
 
echo $RUNTIME >> $LOG ;
 
echo $RUNTIME
 
echo $RUNTIME
# Was between 95-120 seconds
 
 
</source>
 
</source>
  
==== B - Full analysis performed on dataset ====
+
===== B - Relax full analysis performed on dataset =====
First initialize data
+
====== First initialize data ======
 
<source lang="bash">
 
<source lang="bash">
relax_single ../software/NMR-relax/relax_disp/test_suite/shared_data/dispersion/KTeilum_FMPoulsen_MAkke_2006/acbp_cpmg_disp_048MGuHCl_40C_041223/relax_1_ini.py
+
set CPU1=tomat ;
 +
set CPU2=haddock ;
 +
set MODE1=single ;
 +
set MODE2=multi ;
 +
set DATA=$HOME/software/NMR-relax/relax_disp/test_suite/shared_data/dispersion/KTeilum_FMPoulsen_MAkke_2006/acbp_cpmg_disp_048MGuHCl_40C_041223/ ;
 +
set TDATA=$HOME/relax_results
 +
mkdir -p $TDATA/$CPU1 $TDATA/$CPU2 ;
 +
 
 +
cp -r $DATA $TDATA/$CPU1/$MODE1 ;
 +
cp -r $DATA $TDATA/$CPU1/$MODE2 ;
 +
cp -r $DATA $TDATA/$CPU2/$MODE1 ;
 +
cp -r $DATA $TDATA/$CPU2/$MODE2 ;
 +
 
 +
relax_single $TDATA/$CPU1/$MODE1/relax_1_ini.py ;
 +
relax_single $TDATA/$CPU1/$MODE2/relax_1_ini.py ;
 +
relax_single $TDATA/$CPU2/$MODE1/relax_1_ini.py ;
 +
relax_single $TDATA/$CPU2/$MODE2/relax_1_ini.py ;
 
</source>
 
</source>
Then run test
+
====== Relax_full_analysis_performed_on_dataset ======
 
<source lang="bash">
 
<source lang="bash">
 +
#!/bin/tcsh -e
 +
set CPU=$HOST ;
 +
set MODE1=single ;
 +
set MODE2=multi ;
 +
set TDATA=$HOME/relax_results
 +
 
set LOG=timing.log ;
 
set LOG=timing.log ;
 
set TLOG=log.tmp ;
 
set TLOG=log.tmp ;
 +
cd $TDATA
  
set MODE=single ;
+
set MODE=$MODE1 ;
set RUNPROG="relax_${MODE} ../software/NMR-relax/relax_disp/test_suite/shared_data/dispersion/KTeilum_FMPoulsen_MAkke_2006/acbp_cpmg_disp_048MGuHCl_40C_041223/relax_4_model_sel.py -t ${MODE}.log" ;
+
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_4_model_sel.py -t ${CPU}_${MODE}.log" ;
 
echo "---\n$RUNPROG" >> $LOG ;
 
echo "---\n$RUNPROG" >> $LOG ;
 
/usr/bin/time -o $TLOG $RUNPROG ;
 
/usr/bin/time -o $TLOG $RUNPROG ;
Line 83: Line 544:
 
cat $LOG ;
 
cat $LOG ;
  
set MODE=multi ;
+
set MODE=$MODE2 ;
set RUNPROG="relax_${MODE} ../software/NMR-relax/relax_disp/test_suite/shared_data/dispersion/KTeilum_FMPoulsen_MAkke_2006/acbp_cpmg_disp_048MGuHCl_40C_041223/relax_4_model_sel.py -t ${MODE}.log" ;
+
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_4_model_sel.py -t ${CPU}_${MODE}.log" ;
 
echo "---\n$RUNPROG" >> $LOG ;
 
echo "---\n$RUNPROG" >> $LOG ;
 
/usr/bin/time -o $TLOG $RUNPROG ;
 
/usr/bin/time -o $TLOG $RUNPROG ;
 
cat $TLOG >> $LOG ;  
 
cat $TLOG >> $LOG ;  
cat $LOG
+
cat $LOG ;
 
</source>
 
</source>
 +
===== C - Relax full analysis performed on dataset with clustering =====
  
=== Setup of test ===
+
'''Relax_full_analysis_performed_on_dataset_cluster'''
 +
<source lang="bash">
 +
#!/bin/tcsh -e
 +
set CPU=$HOST ;
 +
set MODE1=single ;
 +
set MODE2=multi ;
 +
set TDATA=$HOME/relax_results
  
==== List of computers - the 'lscpu' command ====
+
set LOG=timing.log ;
 +
set TLOG=log.tmp ;
 +
cd $TDATA
 +
 
 +
set MODE=$MODE1 ;
 +
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_5_cluster.py -t ${CPU}_${MODE}_cluster.log" ;
 +
echo "---\n$RUNPROG" >> $LOG ;
 +
/usr/bin/time -o $TLOG $RUNPROG ;
 +
cat $TLOG >> $LOG ;
 +
cat $LOG ;
 +
 
 +
set MODE=$MODE2 ;
 +
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_5_cluster.py -t ${CPU}_${MODE}_cluster.log" ;
 +
echo "---\n$RUNPROG" >> $LOG ;
 +
/usr/bin/time -o $TLOG $RUNPROG ;
 +
cat $TLOG >> $LOG ;
 +
cat $LOG ;
 +
</source>
 +
 
 +
==== Setup of test ====
 +
 
 +
===== List of computers - the 'lscpu' command =====
 
CPU 1
 
CPU 1
 
<source lang="text">
 
<source lang="text">
Line 143: Line 632:
 
</source>
 
</source>
  
==== Execution scripts ====
+
===== Execution scripts =====
 
'''relax_single'''
 
'''relax_single'''
 
<source lang="bash">
 
<source lang="bash">
Line 169: Line 658:
  
 
# Run relax in multi processor mode.
 
# Run relax in multi processor mode.
/usr/lib64/openmpi/bin/mpirun -np $NP $RELAX --multi='mpi4py' $argv
+
mpirun -np $NP $RELAX --multi='mpi4py' $argv
 
</source>
 
</source>
  
Line 184: Line 673:
 
! MC_NUM
 
! MC_NUM
 
! MODELS
 
! MODELS
! Time
+
! Time (s)
 +
|-
 +
| CPU 1
 +
| 1
 +
| A
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| 95, 105
 
|-
 
|-
 
| CPU 1
 
| CPU 1
 +
| 2
 +
| A
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| 96, 120
 +
|-
 +
| CPU 2
 
| 1
 
| 1
 
| A
 
| A
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| 85, 78
 +
|-
 +
| CPU 2
 +
| 24
 +
| A
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| 133, 143
 +
|-
 +
| CPU 1
 +
| 1
 +
| B
 +
| 82
 +
| 16
 +
| 11
 +
| 50
 +
| MODEL_ALL, single res
 +
| 9:16:33
 +
|-
 +
| CPU 1
 +
| 2
 +
| B
 
| 82
 
| 82
 
| 16
 
| 16
 
| 11
 
| 11
 
| 50
 
| 50
| ['R2eff', 'No Rex', 'TSMFK01', 'LM63', 'LM63 3-site', 'CR72', 'CR72 full', 'IT99', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded', 'NS CPMG 2-site star']
+
| MODEL_ALL, single res
|  
+
| 8:06:44
 +
|-
 +
| CPU 2
 +
| 1
 +
| B
 +
| 82
 +
| 16
 +
| 11
 +
| 50
 +
| MODEL_ALL, single res
 +
| 8:18:21
 +
|-
 +
| CPU 2
 +
| 24
 +
| B
 +
| 82
 +
| 16
 +
| 11
 +
| 50
 +
| MODEL_ALL, single res
 +
| 2:17:02
 +
|-
 +
| CPU 1
 +
| 1
 +
| C
 +
| 78
 +
| 16
 +
| 11
 +
| 50
 +
| 'R2eff', 'No Rex', 'TSMFK01', clustering
 +
| 71:32:18
 +
|-
 +
| CPU 1
 +
| 2
 +
| C
 +
| 78
 +
| 16
 +
| 11
 +
| 50
 +
| 'R2eff', 'No Rex', 'TSMFK01', clustering
 +
| 82:27:13
 +
|-
 +
| CPU 2
 +
| 1
 +
| C
 +
| 78
 +
| 16
 +
| 11
 +
| 50
 +
| 'R2eff', 'No Rex', 'TSMFK01', clustering
 +
| 58:45:47
 +
|-
 +
| CPU 2
 +
| 24
 +
| C
 +
| 78
 +
| 16
 +
| 11
 +
| 50
 +
| 'R2eff', 'No Rex', 'TSMFK01', clustering
 +
| 145:01:33
 
|-
 
|-
 
|}
 
|}
Line 200: Line 799:
 
Notes:
 
Notes:
 
# Nr exp. = Nr of experiments = Nr of CPMG frequencies subtracted repetitions and reference spectrums.
 
# Nr exp. = Nr of experiments = Nr of CPMG frequencies subtracted repetitions and reference spectrums.
 +
# MODEL_ALL = ['R2eff', 'No Rex', 'TSMFK01', 'LM63', 'LM63 3-site', 'CR72', 'CR72 full', 'IT99', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded', 'NS CPMG 2-site star']
  
 
== See also ==
 
== See also ==
[[Category:Devel]]
+
[[Category:Installation]]
 +
[[Category:Development]]

Latest revision as of 08:45, 6 October 2020

Open MPI is a Message Passing Interface (MPI) used in relax for parallalised calculations.

mpi4py OpenMPI

This package provides Python bindings for the Message Passing Interface (MPI) standard. It is implemented on top of the MPI-1/2/3 specification and exposes an API which grounds on the standard MPI-2 C++ bindings.

relax manual on Multi processor usage

If you have OpenMPI and mpi4py installed, then you have access to Gary Thompson's multi-processor framework for MPI parallelisation.

Gary has achieved near perfect scaling efficiency:

https://www.nmr-relax.com/mail.gna.org/public/relax-devel/2007-05/msg00000.html

Dependencies

  1. Python 2.4 to 2.7 or 3.0 to 3.4, or a recent PyPy release.
  2. A functional MPI 1.x/2.x/3.x implementation like MPICH or Open MPI built with shared/dynamic libraries.

How does it work ?

If mpirun is started with "np 4", relax will get 4 "rank" processors.
relax will organize 1 processor as "organizer" which sends jobs to 3 slaves, and receive and organize the returned results.

relax will for the main part of the code only use 1 processor.
Only the computational heavy parts of relax are prepared for multi processing.
And the computational heavy part is minimising to find optimal parameters.
All other parts of relax are "light-weight", which only read/writes text files, or organize data.

For minimization, relax collects all data and functions organized into classes, and pack this as independent job to be send to a slave for minimizing.
To pack a job, the job needs to have sufficient information to "live an independent life". The data to minimize, the functions to use, how treat the result, and send it return to the master.
The master server asks this: "Slave, find the minimization of this spin cluster, and return the results to me when you are done, then you get a new job".

For example 10 slaves, this makes it possible to

  1. Simultaneously minimise 10 free spins.
  2. Simultaneously make 10 Monte-Carlo simulations

Install OpenMPI on linux and set environments

See https://www10.informatik.uni-erlangen.de/Cluster/

# Install openmpi-devel, to get 'mpicc'
sudo yum install openmpi-devel

# Check for mpicc
which mpicc

# If not found set environments by loading module
# See avail
module avail

# Show what loading does
module show openmpi-x86_64
# or
module show openmpi-1.10-x86_64

# See if anything is loaded
module list

# Load
module load openmpi-x86_64
# Or
module load openmpi-1.10-x86_64

# See list
module list

# Check for mpicc, mpirun or mpiexec
which mpicc
which mpirun
which mpiexec

# Unload
module unload openmpi-x86_64

In .cshrc file, one could put

# Open MPI: Open Source High Performance Computing
foreach x (tomat bax minima elvis)
    if ( $HOST == $x) then 
        module load openmpi-x86_64 
    endif
end

If not found, try this fix, ref: http://forums.fedoraforum.org/showthread.php?t=194688

#For 32 computer.
sudo ln -s /usr/lib/openmpi/bin/mpicc /usr/bin/mpicc
# For 64 bit computer.
sudo ln -s /usr/lib64/openmpi/bin/mpicc /usr/bin/mpicc
# or
sudo ln -s  /usr/lib64/openmpi-1.10/bin/mpicc /usr/bin/mpicc

Install mpi4py

Linux and Mac

Remember to check, if there are newer versions of mpi4py.
The mpi4py library can be installed on all UNIX systems by typing:

# Change to bash, if in tcsh shell
#bash
v=2.0.0 
#tcsh
set v=2.0.0 

pip install https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz
pip install https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz --upgrade

# Or to use another python interpreter than standard
wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz
tar -xzf mpi4py-$v.tar.gz
rm mpi4py-$v.tar.gz
cd mpi4py-$v
# Use the path to the python to build with
python setup.py build
python setup.py install
cd ..
rm -rf mpi4py-$v

Then test

python
import mpi4py
mpi4py.__file__

Relax In multiprocessor mode

How many processors should I start?
You should start as many cores as you have. But not counting in threads.
In this example, you can start 12 (6*2), where Relax will take 1 for receiving results, and 11 for calculations.

lscpu
----
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2

You can continue try this, until a good results

# With shell
mpirun -np 4 echo "hello world"

# With python
mpirun -np 4 python -m mpi4py helloworld

# If newer version of mpirun, then --report-bindings works
mpirun --report-bindings -np 4 echo "hello world"
mpirun --report-bindings -np 12 echo "hello world"
# This is too much
mpirun --report-bindings -np 13 echo "hello world"

Script for ex

tcsh 
set RELAX=`which relax`

# Normal
mpirun -np 12 $RELAX --multi='mpi4py'

# In gui
mpirun -np 12 $RELAX --multi='mpi4py' -g

where N is the number of slaves you have. See the mpirun documentation for details - this is not part of relax.
This code runs in the GUI, the script UI and the prompt UI, i.e. everywhere.

Helper start scripts

If you have several versions or development branches of relax installed, you could probably use some of these scripts, and put them in your PATH.

Script for force running relax on server computer

This script exemplifies a setup, where the above installation requirements is met on one server computer haddock, and where satellite computers are forced to run on this computer.

The file relax_trunk is made executable (chmod +x relax_trunk), and put in a PATH, known by all satellite computers.

#!/bin/tcsh -f

# Set the lax version used for this script.
set RELAX=/network_drive/software_user/software/NMR-relax/relax_trunk/relax

# Check machine, since only machine haddock have correct packages installed.
if ( $HOST != "haddock") then
        echo "You have to run on haddock. I do it for you"
        ssh haddock -Y -t "cd $PWD; $RELAX $argv; /bin/tcsh"
else
        $RELAX $argv
endif

Script for running relax with maximum number of processors available

This script exemplifies a setup, to test the running relax with maximum number of processors.

The file relax_test is made executable, and put in a PATH, known by all satellite computers.

#!/bin/tcsh -fe

# Set the relax version used for this script.
set RELAX=/sbinlab2/tlinnet/software/NMR-relax/relax_trunk/relax

# Set number of available CPUs.
set NPROC=`nproc`
set NP=`echo $NPROC + 0 | bc `

echo "Running relax with NP=$NP in multi-processor mode"

# Run relax in multi processor mode.
mpirun -np $NP $RELAX --multi='mpi4py' $argv

Script for force running relax on server computer with openmpi

#!/bin/tcsh

# Set the lax version used for this script.
set RELAX=/sbinlab2/software/NMR-relax/relax_trunk/relax

# Set number of available CPUs.
#set NPROC=`nproc`
set NPROC=10
set NP=`echo $NPROC + 0 | bc `

# Run relax in multi processor mode.
set RELAXRUN="mpirun -np $NP $RELAX --multi='mpi4py' $argv"

# Check machine, since only machine haddock have openmpi-devel installed
if ( $HOST != "haddock") then
    echo "You have to run on haddock. I do it for you"
    ssh haddock -Y -t "cd $PWD; $RELAXRUN; /bin/tcsh"
else
    mpirun -np $NP $RELAX --multi='mpi4py' $argv
endif

Setting up relax on super computer Beagle2

Please see post from Lora Picton relax_on_Beagle2

Message: http://thread.gmane.org/gmane.science.nmr.relax.user/1821

Commands and FAQ about mpirun

See oracles page on mpirun and the manual openmpi:

  1. https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingPrograms.html
  2. http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php

For a simple SPMD (Single Process, Multiple Data) job, the typical syntax is:

mpirun -np x program-name

Find number of Socket, Cores and Threads

See http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options

lscpu | egrep -e "CPU|Thread|Core|Socket"
--- tomat
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
CPU family:            6
CPU MHz:               1600.000
NUMA node0 CPU(s):     0-3
--- Machine haddock
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
CPU family:            6
CPU MHz:               2394.135
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23

Test binding to socket

module load openmpi-x86_64

Output from a machine with: Thread(s) per core: 1, Core(s) per socket: 4, Socket(s): 1

mpirun --report-bindings -np 4 relax --multi='mpi4py'
[tomat:28223] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/.]
[tomat:28223] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B]
[tomat:28223] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
[tomat:28223] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./.]

Output when running with to many processed from a machine with: Thread(s) per core: 1, Core(s) per socket: 4, Socket(s): 1

mpirun --report-bindings -np 5 relax --multi='mpi4py'
[tomat:31434] MCW rank 0 is not bound (or bound to all available processors)
[tomat:31434] MCW rank 1 is not bound (or bound to all available processors)
[tomat:31434] MCW rank 2 is not bound (or bound to all available processors)
[tomat:31434] MCW rank 3 is not bound (or bound to all available processors)
[tomat:31434] MCW rank 4 is not bound (or bound to all available processors)

Output from a machine with: Thread(s) per core: 2, Core(s) per socket: 6, Socket(s): 2

mpirun --report-bindings -np 11 relax --multi='mpi4py'
[haddock:31110] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../..][../../../../../..]
[haddock:31110] MCW rank 7 bound to socket 1[core 9[hwt 0-1]]: [../../../../../..][../../../BB/../..]
[haddock:31110] MCW rank 8 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/..][../../../../../..]
[haddock:31110] MCW rank 9 bound to socket 1[core 10[hwt 0-1]]: [../../../../../..][../../../../BB/..]
[haddock:31110] MCW rank 10 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB][../../../../../..]
[haddock:31110] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../..][../../../../../..]
[haddock:31110] MCW rank 1 bound to socket 1[core 6[hwt 0-1]]: [../../../../../..][BB/../../../../..]
[haddock:31110] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..][../../../../../..]
[haddock:31110] MCW rank 3 bound to socket 1[core 7[hwt 0-1]]: [../../../../../..][../BB/../../../..]
[haddock:31110] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../..][../../../../../..]
[haddock:31110] MCW rank 5 bound to socket 1[core 8[hwt 0-1]]: [../../../../../..][../../BB/../../..]

Use mpirun with ssh hostfile

Caution  This is test only. This appears not to function well!

See

  1. https://www.open-mpi.org/faq/?category=running#mpirun-hostfile
  2. http://mirror.its.dal.ca/openmpi/faq/?category=running#simple-spmd-run
  3. https://www.open-mpi.org/faq/?category=rsh
  4. https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingBatchPrograms.html

We have the 3 machines bax minima elvis.

Let's try to make a hostfile and use them at the same time

set MPIHARR = (bax minima elvis)
foreach MPIH ($MPIHARR)
ssh $MPIH 'echo $HOST; lscpu | egrep -e "Thread|Core|Socket"; module list'
echo ""
end

Output

bax
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
Currently Loaded Modulefiles:
  1) openmpi-x86_64

minima
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
Currently Loaded Modulefiles:
  1) openmpi-x86_64

elvis
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
Currently Loaded Modulefiles:
  1) openmpi-x86_64

The node machines is a quad-processor machine, and we want to reserve 1 cpu for the user at the machine.

Make a host file, Currently, I cannot get more than 2 ssh to work at the same time.

cat << EOF > relax_hosts
localhost slots=3 max-slots=4
bax slots=3 max-slots=4
minima slots=3 max-slots=4
#elvis slots=3 max-slots=4
EOF

cat relax_hosts

Then try to run some tests

# Check first environments
ssh localhost env | grep -i path
ssh bax env | grep -i path

mpirun --host localhost hostname
mpirun --mca plm_base_verbose 10 --host localhost hostname

# On another machine, this will not work because of the firewall
mpirun --host bax hostname

# Verbose for bax
mpirun --mca plm_base_verbose 10 --host bax hostname
mpirun --mca rml_base_verbose 10  --host bax hostname

# This shows that TCP is having problems: tcp_peer_complete_connect: connection failed with error 113
mpirun --mca oob_base_verbose 10 --host bax hostname
# Shutdown firewall
sudo iptables -L -n
sudo service iptables stop
sudo iptables -L -n
# Try again
mpirun --mca oob_base_verbose 10 --host bax hostname
mpirun --host bax hostname

# Now try
mpirun --host localhost,bax hostname
mpirun --host localhost,bax,minima hostname
mpirun --host localhost,bax,elvis hostname

# Test why 4 machines not working
mpirun --mca plm_base_verbose 10 --host localhost,bax,elvis,minima hostname
mpirun --mca rml_base_verbose 10 --host localhost,bax,elvis,minima hostname
mpirun --mca oob_base_verbose 10 --host localhost,bax,elvis,minima hostname

# Try just 3 machines
mpirun --report-bindings --hostfile relax_hosts hostname
mpirun --report-bindings -np 9 --hostfile relax_hosts hostname
mpirun --report-bindings -np 9 --hostfile relax_hosts uptime

# Now try relax
mpirun --report-bindings --hostfile relax_hosts relax --multi='mpi4py'

Running Parallel Jobs with queue system

See:

  1. https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingBatchPrograms.html

Running Parallel Jobs in the Sun Grid Engine Environment

See

  1. https://www.open-mpi.org/faq/?category=building#build-rte-sge
  2. http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslSge

Test if you have it

ompi_info | grep gridengine

Running Parallel Jobs in the PBS/Torque

See

  1. https://www.open-mpi.org/faq/?category=building#build-rte-tm
  2. http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.hpcrh/buildtorque.htm
  3. http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.hpcrh/installmn.htm%23installingthemanagementnode?lang=en

Test if you have it

ompi_info | grep tm

Running Parallel Jobs in SLURM

See

  1. https://www.open-mpi.org/faq/?category=building#build-rte-sge
  2. https://www.open-mpi.org/faq/?category=slurm

Test if you have it

ompi_info | grep slurm

Updates

Update 2013/09/11

See Commit

Huge speed win for the relaxation dispersion analysis - optimisation now uses the multi-processor.

The relaxation dispersion optimisation has been parallelised at the level of the spin clustering.
It uses Gary Thompson's multi-processor framework. This allows the code to run on multi-core, multi -processor systems, clusters, grids, and anywhere the OpenMPI protocol is available.

Because the parallelisation is at the cluster level there are some situations, whereby instead of optimisation being faster when running on multiple slaves, the optimisation will be slower.
This is the case when all spins being studied is clustered into a small number of clusters. Example 100 spins into 1 cluster.
It is also likely to be slower for the minimise user function when no clustering is defined, due to the overhead costs of data transfer (but for the numeric models, in this case there will be a clear win).

The two situations where there will be a huge performance win' is the grid_search user function when no clustering is defined and the Monte Carlo simulations for error analysis.

Test of speed

Performed tests

A - Relax_disp systemtest

Relax_disp_systemtest

#!/bin/tcsh
set LOG=single.log ;
relax_single --time -s Relax_disp -t $LOG ;
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
echo $RUNTIME >> $LOG ;
echo $RUNTIME ;

set LOG=multi.log ;
relax_multi --time -s Relax_disp -t $LOG ;
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
echo $RUNTIME >> $LOG ;
echo $RUNTIME
B - Relax full analysis performed on dataset
First initialize data
set CPU1=tomat ;
set CPU2=haddock ;
set MODE1=single ;
set MODE2=multi ;
set DATA=$HOME/software/NMR-relax/relax_disp/test_suite/shared_data/dispersion/KTeilum_FMPoulsen_MAkke_2006/acbp_cpmg_disp_048MGuHCl_40C_041223/ ;
set TDATA=$HOME/relax_results
mkdir -p $TDATA/$CPU1 $TDATA/$CPU2 ;

cp -r $DATA $TDATA/$CPU1/$MODE1 ;
cp -r $DATA $TDATA/$CPU1/$MODE2 ;
cp -r $DATA $TDATA/$CPU2/$MODE1 ;
cp -r $DATA $TDATA/$CPU2/$MODE2 ;

relax_single $TDATA/$CPU1/$MODE1/relax_1_ini.py ;
relax_single $TDATA/$CPU1/$MODE2/relax_1_ini.py ;
relax_single $TDATA/$CPU2/$MODE1/relax_1_ini.py ;
relax_single $TDATA/$CPU2/$MODE2/relax_1_ini.py ;
Relax_full_analysis_performed_on_dataset
#!/bin/tcsh -e
set CPU=$HOST ;
set MODE1=single ;
set MODE2=multi ;
set TDATA=$HOME/relax_results

set LOG=timing.log ;
set TLOG=log.tmp ;
cd $TDATA

set MODE=$MODE1 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_4_model_sel.py -t ${CPU}_${MODE}.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ; 
cat $LOG ;

set MODE=$MODE2 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_4_model_sel.py -t ${CPU}_${MODE}.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ; 
cat $LOG ;
C - Relax full analysis performed on dataset with clustering

Relax_full_analysis_performed_on_dataset_cluster

#!/bin/tcsh -e
set CPU=$HOST ;
set MODE1=single ;
set MODE2=multi ;
set TDATA=$HOME/relax_results

set LOG=timing.log ;
set TLOG=log.tmp ;
cd $TDATA

set MODE=$MODE1 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_5_cluster.py -t ${CPU}_${MODE}_cluster.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ; 
cat $LOG ;

set MODE=$MODE2 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_5_cluster.py -t ${CPU}_${MODE}_cluster.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ; 
cat $LOG ;

Setup of test

List of computers - the 'lscpu' command

CPU 1

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Stepping:              6
CPU MHz:               2659.893
BogoMIPS:              5319.78
L1d cache:             32K
L1i cache:             32K
L2 cache:              3072K
NUMA node0 CPU(s):     0,1

CPU 2

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               2394.136
BogoMIPS:              4787.82
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23
Execution scripts

relax_single

#!/bin/tcsh -fe
# Set the relax version used for this script.
set RELAX=/sbinlab2/tlinnet/software/NMR-relax/relax_disp/relax
# Remove env set to wrong library files.
unsetenv LD_LIBRARY_PATH

# Run relax in multi processor mode.
$RELAX $argv

relax_multi

#!/bin/tcsh -fe
# Set the relax version used for this script.
set RELAX=/sbinlab2/tlinnet/software/NMR-relax/relax_disp/relax
# Remove env set to wrong library files.
unsetenv LD_LIBRARY_PATH

# Set number of available CPUs.
set NPROC=`nproc`
set NP=`echo $NPROC + 1 | bc `

# Run relax in multi processor mode.
mpirun -np $NP $RELAX --multi='mpi4py' $argv

Results

Computer Nr of CPU's. Test type Nr of spins Nr exp. GRID_INC MC_NUM MODELS Time (s)
CPU 1 1 A - - - - - 95, 105
CPU 1 2 A - - - - - 96, 120
CPU 2 1 A - - - - - 85, 78
CPU 2 24 A - - - - - 133, 143
CPU 1 1 B 82 16 11 50 MODEL_ALL, single res 9:16:33
CPU 1 2 B 82 16 11 50 MODEL_ALL, single res 8:06:44
CPU 2 1 B 82 16 11 50 MODEL_ALL, single res 8:18:21
CPU 2 24 B 82 16 11 50 MODEL_ALL, single res 2:17:02
CPU 1 1 C 78 16 11 50 'R2eff', 'No Rex', 'TSMFK01', clustering 71:32:18
CPU 1 2 C 78 16 11 50 'R2eff', 'No Rex', 'TSMFK01', clustering 82:27:13
CPU 2 1 C 78 16 11 50 'R2eff', 'No Rex', 'TSMFK01', clustering 58:45:47
CPU 2 24 C 78 16 11 50 'R2eff', 'No Rex', 'TSMFK01', clustering 145:01:33

Notes:

  1. Nr exp. = Nr of experiments = Nr of CPMG frequencies subtracted repetitions and reference spectrums.
  2. MODEL_ALL = ['R2eff', 'No Rex', 'TSMFK01', 'LM63', 'LM63 3-site', 'CR72', 'CR72 full', 'IT99', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded', 'NS CPMG 2-site star']

See also