MICCoM School 2017 Ex#4 : Parallelization¶
We are going to explain the parallelization levels present in WEST.
4.1 Download the material¶
In this excercixe we will focus on the wstat.x
input.
[1]:
# pseudopotentials
!wget -N -q http://www.quantum-simulation.org/potentials/sg15_oncv/upf/Si_ONCV_PBE-1.2.upf
!wget -N -q http://www.quantum-simulation.org/potentials/sg15_oncv/upf/H_ONCV_PBE-1.2.upf
# input files
!wget -N -q http://www.west-code.org/doc/training/silane/pw.in
!wget -N -q http://www.west-code.org/doc/training/silane/wstat.in
We need to read the output of a DFT calculation, therefore as first step we run the DFT calculation invoking the executable pw.x
on 8 cores.
[ ]:
!mpirun -n 8 pw.x -i pw.in > pw.out
5.2 Parallelization schemes in WEST¶
WEST uses up to four layers of parallelism on CPU-based computers:
Plane-waves (FFT)
Bands
Spin channels
Eigenpotentials
The following command is using N CPU cores, NI images, NK pools, NB band groups, and N/(NINKNB) cores per FFT: mpirun -n N wstat.x -nimage NI -npool NK -nbgrp NB -i wstat.in > wstat.out
This is how we achieved good scaling on CPU-based supercomputers such as the BG/Q Mira at Argonne National Laboratory, where WEST makes efficient use of \(512\) cores per FFT and \(1024\) images, for a total of N\(=512\times 1024 = 524288\) cores. Details about the implementation are described in J. Chem. Theory Comput. 11, 2680 (2015) : On computers equipped with GPU accelerators, WEST is capable of harnessing the data parallelism provided by
GPUs. Again, we achieved good scaling on GPU-accelerated supercomputers such as Summit at Oak Ridge National Laboratory, where WEST makes efficient use of over \(25,000\) NVIDIA V100 GPUs. Details about the implementation are described in J. Chem. Theory Comput. 18, 4690-4707 (2022).
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 2 wstat.x -nimage 1 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores2_image1.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 4 wstat.x -nimage 1 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores4_image1.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 4 wstat.x -nimage 2 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores4_image2.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 1 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores8_image1.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 2 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores8_image2.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 4 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores8_image4.json
[ ]:
!ls -lrt wstat_*json
Load the files.
[ ]:
import json
data = {}
for name in ['cores2_image1', 'cores4_image1', 'cores4_image2', 'cores8_image1', 'cores8_image2', 'cores8_image4'] :
# read data wstat_XX.json
with open('wstat_'+name+'.json') as file:
data[name] = json.load(file)
print(json.dumps(data, indent=2))
We plot the energy levels of DFT and GW.
[ ]:
import numpy as np
import matplotlib.pyplot as plt
# timings
y = {}
c = {}
# 2 cores
for name in ['cores2_image1'] :
y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
c[name] = 'black'
# 4 cores
for name in ['cores4_image1', 'cores4_image2'] :
y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
c[name] = 'blue'
# 8 cores
for name in ['cores8_image1','cores8_image2','cores8_image4'] :
y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
c[name] = 'green'
print(y)
# plot
x = list( range( 1, len(y)+1 ) )
labels = y.keys()
fig, ax = plt.subplots(1, 1)
counter = 0
for i in labels :
for a in y[i] :
ax.hlines(a, x[counter]-0.25, x[counter]+0.25, color=c[i])
counter += 1
plt.xticks(x, labels, rotation='vertical')
plt.ylabel('Time (s)')
plt.title('Parallelization')
plt.show()