MICCoM School 2017 Ex#4 : Parallelization

We are going to explain the parallelization levels present in WEST.

4.1 Download the material

In this excercixe we will focus on the wstat.x input.

[1]:
# pseudopotentials
!wget -N -q http://www.quantum-simulation.org/potentials/sg15_oncv/upf/Si_ONCV_PBE-1.2.upf
!wget -N -q http://www.quantum-simulation.org/potentials/sg15_oncv/upf/H_ONCV_PBE-1.2.upf

# input files
!wget -N -q http://www.west-code.org/doc/training/silane/pw.in
!wget -N -q http://www.west-code.org/doc/training/silane/wstat.in

We need to read the output of a DFT calculation, therefore as first step we run the DFT calculation invoking the executable pw.x on 8 cores.

[ ]:
!mpirun -n 8 pw.x -i pw.in > pw.out

5.2 Parallelization schemes in WEST

WEST uses up to four layers of parallelism on CPU-based computers:

  • Plane-waves (FFT)

  • Bands

  • Spin channels

  • Eigenpotentials

The following command is using N CPU cores, NI images, NK pools, NB band groups, and N/(NINKNB) cores per FFT: mpirun -n N wstat.x -nimage NI -npool NK -nbgrp NB -i wstat.in > wstat.out

This is how we achieved good scaling on CPU-based supercomputers such as the BG/Q Mira at Argonne National Laboratory, where WEST makes efficient use of \(512\) cores per FFT and \(1024\) images, for a total of N\(=512\times 1024 = 524288\) cores. Details about the implementation are described in J. Chem. Theory Comput. 11, 2680 (2015) : 462c6a36e2704f408891780fec17f284 On computers equipped with GPU accelerators, WEST is capable of harnessing the data parallelism provided by GPUs. Again, we achieved good scaling on GPU-accelerated supercomputers such as Summit at Oak Ridge National Laboratory, where WEST makes efficient use of over \(25,000\) NVIDIA V100 GPUs. Details about the implementation are described in J. Chem. Theory Comput. 18, 4690-4707 (2022).

[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 2 wstat.x -nimage 1 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores2_image1.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 4 wstat.x -nimage 1 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores4_image1.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 4 wstat.x -nimage 2 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores4_image2.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 1 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores8_image1.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 2 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores8_image2.json
[ ]:
%%bash
export OMP_NUM_THREADS=1
mpirun -n 8 wstat.x -nimage 4 -i wstat.in > wstat.out
[ ]:
cp silane.wstat.save/wstat.json wstat_cores8_image4.json
[ ]:
!ls -lrt wstat_*json

Load the files.

[ ]:
import json

data = {}

for name in ['cores2_image1', 'cores4_image1', 'cores4_image2', 'cores8_image1', 'cores8_image2', 'cores8_image4'] :
    # read data wstat_XX.json
    with open('wstat_'+name+'.json') as file:
        data[name] = json.load(file)

print(json.dumps(data, indent=2))

We plot the energy levels of DFT and GW.

[ ]:
import numpy as np
import matplotlib.pyplot as plt

# timings
y = {}
c = {}

# 2 cores
for name in ['cores2_image1'] :
    y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
    c[name] = 'black'

# 4 cores
for name in ['cores4_image1', 'cores4_image2'] :
    y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
    c[name] = 'blue'

# 8 cores
for name in ['cores8_image1','cores8_image2','cores8_image4'] :
    y[name] = [ data[name]['timing']['WSTAT']['wall:sec'] ]
    c[name] = 'green'

print(y)

# plot
x = list( range( 1, len(y)+1 ) )
labels = y.keys()

fig, ax = plt.subplots(1, 1)
counter = 0
for i in labels :
    for a in y[i] :
        ax.hlines(a, x[counter]-0.25, x[counter]+0.25, color=c[i])
    counter += 1

plt.xticks(x, labels, rotation='vertical')
plt.ylabel('Time (s)')

plt.title('Parallelization')

plt.show()