Open main menu

Changes

OpenMPI

6,914 bytes added, 08:45, 6 October 2020
m
Open MPI is a Message Passing Interface (MPI) used in relax for parallalised calculations. == mpi4py OpenMPI ==
This package provides Python bindings for the Message Passing Interface (MPI) standard. It is implemented on top of the MPI-1/2/3 specification and exposes an API which grounds on the standard MPI-2 C++ bindings.
Gary has achieved near perfect scaling efficiency:
https://mail.{{gna.org/public/mailing list url|relax-devel/2007-05/msg00000.html}}
=== Dependencies ===
# Python 2.4 to 2.7 or 3.0 to 3.4, or a recent PyPy release.
# A functional MPI 1.x/2.x/3.x implementation like MPICH or Open MPI built with shared/dynamic libraries.
=== How does it work ? === If mpirun is started with "np 4", relax will get 4 "rank" processors. <br>relax will organize 1 processor as "organizer" which sends jobs to 3 slaves, and receive and organize the returned results. relax will for the main part of the code '''only use 1 processor'''. <br>Only the computational heavy parts of relax are prepared for multi processing. <br>And the computational heavy part is '''minimising''' to find optimal parameters. <br>All other parts of relax are "light-weight", which only read/writes text files, or organize data.  For minimization, relax collects all data and functions organized into classes, and pack this as independent job to be send to a slave for minimizing.<br>To pack a job, the job needs to have sufficient information to "live an independent life". The data to minimize, the functions to use, how treat the result, and send it return to the master. <br>The master server asks this:"Slave, find the minimization of this spin cluster, and return the results to me when you are done, then you get a new job". For example 10 slaves, this makes it possible to# Simultaneously minimise 10 free spins.# Simultaneously make 10 Monte-Carlo simulations === Install OpenMPI on linux and set environments ===
See https://www10.informatik.uni-erlangen.de/Cluster/
# Show what loading does
module show openmpi-x86_64
# or
module show openmpi-1.10-x86_64
# See if anything is loaded
# Load
module load openmpi-x86_64
# Or
module load openmpi-1.10-x86_64
 
# See list
module list
# For 64 bit computer.
sudo ln -s /usr/lib64/openmpi/bin/mpicc /usr/bin/mpicc
# or
sudo ln -s /usr/lib64/openmpi-1.10/bin/mpicc /usr/bin/mpicc
 
</source>
== Install mpi4py ==
=== Linux and Mac ===
Remember to check, if there are newer versions of [https://bitbucket.org/mpi4py/mpi4py/downloads/ mpi4py]. <br>
The [https://bitbucket.org/mpi4py/mpi4py mpi4py] library can be installed on all UNIX systems by typing:
<{{#tag:source lang="bash">|
# Change to bash, if in tcsh shell
#bash
v=1.3.1{{current version mpi4py}}
#tcsh
set v=1.3.1{{current version mpi4py}}
pip install https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz
pip install https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz --upgrade
# Or to use another python interpreter than standard
wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-$v.tar.gz
tar -xzf mpi4py-$v.tar.gz
rm mpi4py-$v.tar.gz
cd mpi4py-$v
# Use the path to the python to build with
python setup.py build
python setup.py install
cd ..
rm -rf mpi4py-$v
|lang="bash"
}}
 
Then test
<source lang="python">
python
import mpi4py
mpi4py.__file__
</source>
 
== Relax In multiprocessor mode ==
 
'''How many processors should I start?''' <br>
You should start as many cores as you have. '''But not counting in threads'''. <br>
In this example, you can start 12 (6*2), where Relax will take 1 for receiving results, and 11 for calculations.
<source lang="bash">
lscpu
----
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
</source>
 
You can continue try this, until a good results
<source lang="bash">
# With shell
mpirun -np 4 echo "hello world"
 
# With python
mpirun -np 4 python -m mpi4py helloworld
 
# If newer version of mpirun, then --report-bindings works
mpirun --report-bindings -np 4 echo "hello world"
mpirun --report-bindings -np 12 echo "hello world"
# This is too much
mpirun --report-bindings -np 13 echo "hello world"
</source>
= Relax In multiprocessor mode =Script for ex
<source lang="bash">
tcsh
# Normal
mpirun -np N+1 12 $RELAX --multi='mpi4py'
# In gui
mpirun -np N+1 12 $RELAX --multi='mpi4py' -g
</source>
This code runs in the GUI, the script UI and the prompt UI, i.e. everywhere.
=== Helper start scripts ===
If you have several versions or development branches of relax installed, you could probably use some of these scripts, and put them in your PATH.
==== Script for force running relax on server computer ====
This script exemplifies a setup, where the above installation requirements is met on one server computer ''haddock'', and where satellite computers are forced to run on this computer.
</source>
==== Script for running relax with maximum number of processors available ====
This script exemplifies a setup, to test the running relax with maximum number of processors.
# Set number of available CPUs.
set NPROC=`nproc`
set NP=`echo $NPROC + 1 0 | bc `
echo "Running relax with NP=$NP in multi-processor mode"
</source>
==== Script for force running relax on server computer with openmpi ====
<source lang="bash">
#!/bin/tcsh
#set NPROC=`nproc`
set NPROC=10
set NP=`echo $NPROC + 1 0 | bc `
# Run relax in multi processor mode.
</source>
== Setting up relax on super computer Beagle2 == Please see post from Lora Picton [[relax_on_Beagle2]] Message: http://thread.gmane.org/gmane.science.nmr.relax.user/1821 == Commands and FAQ about mpirun ==
See oracles page on mpirun and the manual openmpi:
# https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingPrograms.html
</source>
=== Find number of Socket, Cores and Threads ===
See http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options
</source>
=== Test binding to socket ===
<source lang="bash">
module load openmpi-x86_64
</source>
== Use mpirun with ssh hostfile == {{caution|This is test only. This appears not to function well!}} See # https://www.open-mpi.org/faq/?category=running#mpirun-hostfile# http://mirror.its.dal.ca/openmpi/faq/?category=running#simple-spmd-run# https://www.open-mpi.org/faq/?category=rsh# https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingBatchPrograms.html We have the 3 machines '''bax minima elvis'''.<br> Let's try to make a hostfile and use them at the same time <source lang="bash">set MPIHARR = (bax minima elvis)foreach MPIH ($MPIHARR)ssh $MPIH 'echo $HOST; lscpu | egrep -e "Thread|Core|Socket"; module list'echo ""end</source>Output<source lang="text">baxThread(s) per core: 1Core(s) per socket: 4Socket(s): 1Currently Loaded Modulefiles: 1) openmpi-x86_64 minimaThread(s) per core: 1Core(s) per socket: 4Socket(s): 1Currently Loaded Modulefiles: 1) openmpi-x86_64 elvisThread(s) per core: 1Core(s) per socket: 4Socket(s): 1Currently Loaded Modulefiles: 1) openmpi-x86_64</source> The node machines is a quad-processor machine, and we want to reserve 1 cpu for the user at the machine. Make a host file, Currently, I cannot get more than 2 ssh to work at the same time.<source lang="bash">cat << EOF > relax_hostslocalhost slots=3 max-slots=4bax slots=3 max-slots=4minima slots=3 max-slots=4#elvis slots=3 max-slots=4EOF cat relax_hosts</source> Then try to run some tests <source lang="bash"># Check first environmentsssh localhost env | grep -i pathssh bax env | grep -i path mpirun --host localhost hostnamempirun --mca plm_base_verbose 10 --host localhost hostname # On another machine, this will not work because of the firewallmpirun --host bax hostname # Verbose for baxmpirun --mca plm_base_verbose 10 --host bax hostnamempirun --mca rml_base_verbose 10 --host bax hostname # This shows that TCP is having problems: tcp_peer_complete_connect: connection failed with error 113mpirun --mca oob_base_verbose 10 --host bax hostname# Shutdown firewallsudo iptables -L -nsudo service iptables stopsudo iptables -L -n# Try againmpirun --mca oob_base_verbose 10 --host bax hostnamempirun --host bax hostname # Now trympirun --host localhost,bax hostnamempirun --host localhost,bax,minima hostnamempirun --host localhost,bax,elvis hostname # Test why 4 machines not workingmpirun --mca plm_base_verbose 10 --host localhost,bax,elvis,minima hostnamempirun --mca rml_base_verbose 10 --host localhost,bax,elvis,minima hostnamempirun --mca oob_base_verbose 10 --host localhost,bax,elvis,minima hostname # Try just 3 machinesmpirun --report-bindings --hostfile relax_hosts hostnamempirun --report-bindings -np 9 --hostfile relax_hosts hostnamempirun --report-bindings -np 9 --hostfile relax_hosts uptime # Now try relaxmpirun --report-bindings --hostfile relax_hosts relax --multi='mpi4py'</source> == Running Parallel Jobs with queue system ==See:# https://docs.oracle.com/cd/E19923-01/820-6793-10/ExecutingBatchPrograms.html=== Running Parallel Jobs in the Sun Grid Engine Environment ===See# https://www.open-mpi.org/faq/?category=building#build-rte-sge# http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslSge Test if you have it<source lang="bash">ompi_info | grep gridengine</source> === Running Parallel Jobs in the PBS/Torque ===See# https://www.open-mpi.org/faq/?category=building#build-rte-tm# http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.hpcrh/buildtorque.htm# http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaai.hpcrh/installmn.htm%23installingthemanagementnode?lang=enTest if you have it<source lang="bash">ompi_info | grep tm</source> === Running Parallel Jobs in SLURM ===See# https://www.open-mpi.org/faq/?category=building#build-rte-sge# https://www.open-mpi.org/faq/?category=slurm Test if you have it<source lang="bash">ompi_info | grep slurm</source> == Updates ==
=== Update 2013/09/11 ===
See [http://thread.gmane.org/gmane.science.nmr.relax.scm Commit]
'''no clustering''' is defined and the Monte Carlo simulations for error analysis.
=== Test of speed ===
==== Performed tests ====
===== A - Relax_disp systemtest =====
'''Relax_disp_systemtest'''
<source lang="bash">
</source>
===== B - Relax full analysis performed on dataset =========== First initialize data ======
<source lang="bash">
set CPU1=tomat ;
relax_single $TDATA/$CPU2/$MODE2/relax_1_ini.py ;
</source>
====== Relax_full_analysis_performed_on_dataset ======
<source lang="bash">
#!/bin/tcsh -e
cat $LOG ;
</source>
===== C - Relax full analysis performed on dataset with clustering =====
'''Relax_full_analysis_performed_on_dataset_cluster'''
</source>
==== Setup of test ====
===== List of computers - the 'lscpu' command =====
CPU 1
<source lang="text">
</source>
===== Execution scripts =====
'''relax_single'''
<source lang="bash">
</source>
=== Results ===
{| class="wikitable sortable" border="1"
# MODEL_ALL = ['R2eff', 'No Rex', 'TSMFK01', 'LM63', 'LM63 3-site', 'CR72', 'CR72 full', 'IT99', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded', 'NS CPMG 2-site star']
== See also ==
[[Category:Installation]]
[[Category:DevelDevelopment]]
Trusted, Bureaucrats
4,223

edits