Difference between revisions of "OpenMPI"
Line 1: | Line 1: | ||
− | = | + | = mpi4py OpenMPI = |
[http://www.nmr-relax.com/manual/Usage_multi_processor.html Manual on Multi processor usage] | [http://www.nmr-relax.com/manual/Usage_multi_processor.html Manual on Multi processor usage] | ||
Line 5: | Line 5: | ||
Thompson's multi-processor framework for MPI parallelisation. | Thompson's multi-processor framework for MPI parallelisation. | ||
− | + | = Install == | |
+ | |||
+ | == Linux/Mac == | ||
+ | |||
+ | = About = | ||
+ | The code in relax must be written to support this. This is the case | ||
for the model-free analysis, in which case Gary has achieved near | for the model-free analysis, in which case Gary has achieved near | ||
perfect scaling efficiency: | perfect scaling efficiency: | ||
https://mail.gna.org/public/relax-devel/2007-05/msg00000.html | https://mail.gna.org/public/relax-devel/2007-05/msg00000.html | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Update 2013/09/11 == | == Update 2013/09/11 == | ||
Line 367: | Line 356: | ||
== See also == | == See also == | ||
+ | [[Category:Installation]] | ||
[[Category:Devel]] | [[Category:Devel]] |
Revision as of 10:40, 31 July 2014
Contents
mpi4py OpenMPI
Manual on Multi processor usage
If you have OpenMPI and mpi4py installed, then you have access to Gary Thompson's multi-processor framework for MPI parallelisation.
Install =
Linux/Mac
About
The code in relax must be written to support this. This is the case for the model-free analysis, in which case Gary has achieved near perfect scaling efficiency:
https://mail.gna.org/public/relax-devel/2007-05/msg00000.html
Update 2013/09/11
See Commit
Huge speed win for the relaxation dispersion analysis - optimisation now uses the multi-processor.
The relaxation dispersion optimisation has been parallelised at the level of the spin clustering.
It uses Gary Thompson's multi-processor framework. This allows the code to run on multi-core, multi
-processor systems, clusters, grids, and anywhere the OpenMPI protocol is available.
Because the parallelisation is at the cluster level there are some situations, whereby instead of
optimisation being faster when running on multiple slaves, the optimisation will be slower.
This is the case when all spins being studied is clustered into a small number of clusters. Example 100 spins into 1 cluster.
It is also likely to be slower for the minimise user function when no clustering is defined, due to the
overhead costs of data transfer (but for the numeric models, in this case there will be a clear win).
The two situations where there will be a huge performance win' is the grid_search user function when no clustering is defined and the Monte Carlo simulations for error analysis.
Test of speed
Performed tests
A - Relax_disp systemtest
Relax_disp_systemtest
#!/bin/tcsh
set LOG=single.log ;
relax_single --time -s Relax_disp -t $LOG ;
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
echo $RUNTIME >> $LOG ;
echo $RUNTIME ;
set LOG=multi.log ;
relax_multi --time -s Relax_disp -t $LOG ;
set RUNTIME=`cat $LOG | awk '$1 ~ /^\./{print $0}' | awk '{ sum+=$2} END {print sum}'` ;
echo $RUNTIME >> $LOG ;
echo $RUNTIME
B - Relax full analysis performed on dataset
First initialize data
set CPU1=tomat ;
set CPU2=haddock ;
set MODE1=single ;
set MODE2=multi ;
set DATA=$HOME/software/NMR-relax/relax_disp/test_suite/shared_data/dispersion/KTeilum_FMPoulsen_MAkke_2006/acbp_cpmg_disp_048MGuHCl_40C_041223/ ;
set TDATA=$HOME/relax_results
mkdir -p $TDATA/$CPU1 $TDATA/$CPU2 ;
cp -r $DATA $TDATA/$CPU1/$MODE1 ;
cp -r $DATA $TDATA/$CPU1/$MODE2 ;
cp -r $DATA $TDATA/$CPU2/$MODE1 ;
cp -r $DATA $TDATA/$CPU2/$MODE2 ;
relax_single $TDATA/$CPU1/$MODE1/relax_1_ini.py ;
relax_single $TDATA/$CPU1/$MODE2/relax_1_ini.py ;
relax_single $TDATA/$CPU2/$MODE1/relax_1_ini.py ;
relax_single $TDATA/$CPU2/$MODE2/relax_1_ini.py ;
Relax_full_analysis_performed_on_dataset
#!/bin/tcsh -e
set CPU=$HOST ;
set MODE1=single ;
set MODE2=multi ;
set TDATA=$HOME/relax_results
set LOG=timing.log ;
set TLOG=log.tmp ;
cd $TDATA
set MODE=$MODE1 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_4_model_sel.py -t ${CPU}_${MODE}.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ;
cat $LOG ;
set MODE=$MODE2 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_4_model_sel.py -t ${CPU}_${MODE}.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ;
cat $LOG ;
C - Relax full analysis performed on dataset with clustering
Relax_full_analysis_performed_on_dataset_cluster
#!/bin/tcsh -e
set CPU=$HOST ;
set MODE1=single ;
set MODE2=multi ;
set TDATA=$HOME/relax_results
set LOG=timing.log ;
set TLOG=log.tmp ;
cd $TDATA
set MODE=$MODE1 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_5_cluster.py -t ${CPU}_${MODE}_cluster.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ;
cat $LOG ;
set MODE=$MODE2 ;
set RUNPROG="relax_${MODE} $TDATA/$CPU/$MODE/relax_5_cluster.py -t ${CPU}_${MODE}_cluster.log" ;
echo "---\n$RUNPROG" >> $LOG ;
/usr/bin/time -o $TLOG $RUNPROG ;
cat $TLOG >> $LOG ;
cat $LOG ;
Setup of test
List of computers - the 'lscpu' command
CPU 1
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 6
CPU MHz: 2659.893
BogoMIPS: 5319.78
L1d cache: 32K
L1i cache: 32K
L2 cache: 3072K
NUMA node0 CPU(s): 0,1
CPU 2
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Stepping: 2
CPU MHz: 2394.136
BogoMIPS: 4787.82
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
Execution scripts
relax_single
#!/bin/tcsh -fe
# Set the relax version used for this script.
set RELAX=/sbinlab2/tlinnet/software/NMR-relax/relax_disp/relax
# Remove env set to wrong library files.
unsetenv LD_LIBRARY_PATH
# Run relax in multi processor mode.
$RELAX $argv
relax_multi
#!/bin/tcsh -fe
# Set the relax version used for this script.
set RELAX=/sbinlab2/tlinnet/software/NMR-relax/relax_disp/relax
# Remove env set to wrong library files.
unsetenv LD_LIBRARY_PATH
# Set number of available CPUs.
set NPROC=`nproc`
set NP=`echo $NPROC + 1 | bc `
# Run relax in multi processor mode.
/usr/lib64/openmpi/bin/mpirun -np $NP $RELAX --multi='mpi4py' $argv
Installation
Installation of openmpi and the mpi4py python package according to wiki notes.
Results
Computer | Nr of CPU's. | Test type | Nr of spins | Nr exp. | GRID_INC | MC_NUM | MODELS | Time (s) |
---|---|---|---|---|---|---|---|---|
CPU 1 | 1 | A | - | - | - | - | - | 95, 105 |
CPU 1 | 2 | A | - | - | - | - | - | 96, 120 |
CPU 2 | 1 | A | - | - | - | - | - | 85, 78 |
CPU 2 | 24 | A | - | - | - | - | - | 133, 143 |
CPU 1 | 1 | B | 82 | 16 | 11 | 50 | MODEL_ALL, single res | 9:16:33 |
CPU 1 | 2 | B | 82 | 16 | 11 | 50 | MODEL_ALL, single res | 8:06:44 |
CPU 2 | 1 | B | 82 | 16 | 11 | 50 | MODEL_ALL, single res | 8:18:21 |
CPU 2 | 24 | B | 82 | 16 | 11 | 50 | MODEL_ALL, single res | 2:17:02 |
CPU 1 | 1 | C | 78 | 16 | 11 | 50 | 'R2eff', 'No Rex', 'TSMFK01', clustering | 71:32:18 |
CPU 1 | 2 | C | 78 | 16 | 11 | 50 | 'R2eff', 'No Rex', 'TSMFK01', clustering | 82:27:13 |
CPU 2 | 1 | C | 78 | 16 | 11 | 50 | 'R2eff', 'No Rex', 'TSMFK01', clustering | 58:45:47 |
CPU 2 | 24 | C | 78 | 16 | 11 | 50 | 'R2eff', 'No Rex', 'TSMFK01', clustering | 145:01:33 |
Notes:
- Nr exp. = Nr of experiments = Nr of CPMG frequencies subtracted repetitions and reference spectrums.
- MODEL_ALL = ['R2eff', 'No Rex', 'TSMFK01', 'LM63', 'LM63 3-site', 'CR72', 'CR72 full', 'IT99', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded', 'NS CPMG 2-site star']