Hex, Bugs and More Physics | Emre S. Tasci

a blog about physics, computation, computational physics and materials…

Now that we can include comments in our input files and submit our jobs in a tidied up way, it’s time to automatize the “convergence search” process. For this, we’ll be using the same diamond example of the previous entry.

Suppose that we want to calculate the energies corresponding to various values of the lattice parameter a, ranging from 3.2 to 4, incrementing by 2: 3.2, 3.4, … , 4.0

Incrementation:
In bash, you can define a for loop to handle this:

for (( i = 1; i <= 9; i++ ))
do
  echo $i;
done

1
2
3
4
5
6
7
8
9

But to work with rational numbers, things get a little bit complex:
for (( i = 1; i <= 5; i++ ))
do
    val=$(awk "BEGIN {print 3+$i/5 }")
    echo $val;
done

3.2
3.4
3.6
3.8
4

Alas, it is still not very practical to everytime calculate the necessary values to produce our intended range. This is where the mighty Octave once again comes to the rescue: [A:I:B] in Octave defines a range that starts from A, and continues by incrementing by I until B is reached (or exceeded). So, here is our example in Octave:
octave:1> [3.2:0.2:4]
ans =

   3.2000   3.4000   3.6000   3.8000   4.0000

You can generate temporary input scripts for Octave but better yet, you can also feed simple scripts directly to Octave:
sururi@husniya:~/shared/qe/diamond_tutorial$ echo "[3.2:0.2:4]"|octave -q
ans =

   3.2000   3.4000   3.6000   3.8000   4.0000

The “-q” parameter causes Octave to run in “quiet” mode, so we just feed the range via pipe to Octave and will harvest the output. In general, we’d like to keep the “ans =” and the blank lines out, and then, when there are more than 1 line’s worth of output, Octave puts the “Column” information as in:
octave:4> [3:0.1:4]
ans =

 Columns 1 through 9:

   3.0000   3.1000   3.2000   3.3000   3.4000   3.5000   3.6000   3.7000   3.8000

 Columns 10 and 11:

   3.9000   4.0000

all these nuisances can go away via a simple “grep -v” (with the ‘v’ parameter inverting the filter criteria):
sururi@vala:/tmp$ echo "[3.2:0.1:4]"|octave -q|sed -n "2,1000p"| grep -v "Column"

   3.2000   3.3000   3.4000   3.5000   3.6000   3.7000   3.8000   3.9000

   4.0000

sururi@vala:/tmp$ echo "[3.2:0.1:4]"|octave -q|sed -n "2,1000p"| grep -v "Column"|grep -v -e "^$"
   3.2000   3.3000   3.4000   3.5000   3.6000   3.7000   3.8000   3.9000
   4.0000

Replacement:
Now that we have the variable’s changing values, how are we gonna implement it to the input script? By search and replace. Suppose, we have a template in which the variable’s value is specified by “#REPLACE#”, all we need to do is to “sed” the file:
cat diamond.scf.in | sed "s:#REPLACE#:$a:g">tmp_$a.in

where “$a” is the current value in the iteration.

Parsing the Energy:
We submit this job via mpirun (in our case via mpi-pw.x of the previous post) and parse the output for the final energy:

grep -e ! tmp_$a.out|tail -n1|awk '{print $(NF-1)}'

For each iteration we thus obtain the energy for our $a value, so we put them together in two columns into a data file I’d like to call “E_vs_$a.txt”.

The Script:
Putting all these together, here is the script in its full glory:

#!/bin/bash

# Takes in the template Quantum Espresso input file, runs it iteratively
# changing the #REPLACE# parameters iteratively with the range elements,
# grepping the energy and putting it along with the current value of the
# #REPLACE# parameter.
#
# Emre S. Tasci, 03/2012

# Example: Suppose that the file "diamond.scf.in" contains the following line:
#   A = #REPLACE#
# Calling "qe-opti.sh diamond.scf.in a 3.2 0.2 4.2" would cause that line to be modified as
#   A = 3.2
# and submitted, followed by
#   A = 3.4, 3.6, .., 4.2 in the sequential runs.
#
# at the end, having constructed the E_vs_a.txt file.

if [ $# -ne 5 ]
then
    echo "Usage: qe-opti.sh infile_template.in variable_name initial_value increment     final_value"
    echo "Example: qe-opti.sh diamond.scf.in a 3.2 0.2 4.2"
    echo -e "\nNeeds octave to be accessible via \"octave\" command for the range to     work"
    echo -e "\nEmre S. Tasci, 03/2012"
    exit
fi

logext=".txt"
logfilename=E_vs_$2$logext
rm -f $logfilename

range=$(echo "[$3:$4:$5]"|octave -q|sed -n "2,1000p"|grep -vr "^$"|grep -v "Column")
for a in $range
do
    # echo $a
    cat $1|sed "s:#REPLACE#:$a:g">tmp_$a.in
    /home/sururi/bin/mpi-pw.x 4 tmp_$a.in
energ=$(grep -e ! tmp_$a.out|tail -n1|awk '{print $(NF-1)}')
    # echo $energ
    echo -e $a "    " $energ >> $logfilename
    echo -e $a "    " $energ
done

echo "Results stored in: "$logfilename

Action:
Modify the sample input from the previous post so that it’s lattice parameter A is designated as something like “A = #REPLACE#”, as in:
sururi@bebop:~/shared/qe/diamond_tutorial$ grep "REPLACE" -B2 -A2 diamond.scf.in
    # a,b,c in Angstrom =======
    #A = 3.56712
    A = #REPLACE#
    B = 2.49
    C = 2.49

Then call the qe-opti.sh script with the related parameters:
sururi@bebop:~/shared/qe/diamond_tutorial$ qe-opti.sh diamond.scf.in a 3.2 0.2 4.2
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < tmp_3.2000.in > tmp_3.2000.out

Start : Tue Mar 13 23:21:07 CET 2012
Finish: Tue Mar 13 23:21:08 CET 2012
Elapsed time:0:00:01

3.2000      -22.55859979
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < tmp_3.4000.in > tmp_3.4000.out

Start : Tue Mar 13 23:21:08 CET 2012
Finish: Tue Mar 13 23:21:09 CET 2012
Elapsed time:0:00:01

3.4000      -22.67717405
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < tmp_3.6000.in > tmp_3.6000.out

Start : Tue Mar 13 23:21:09 CET 2012
Finish: Tue Mar 13 23:21:10 CET 2012
Elapsed time:0:00:01

3.6000      -22.69824702
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < tmp_3.8000.in > tmp_3.8000.out

Start : Tue Mar 13 23:21:10 CET 2012
Finish: Tue Mar 13 23:21:11 CET 2012
Elapsed time:0:00:01

3.8000      -22.65805473
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < tmp_4.0000.in > tmp_4.0000.out

Start : Tue Mar 13 23:21:11 CET 2012
Finish: Tue Mar 13 23:21:13 CET 2012
Elapsed time:0:00:02

4.0000      -22.57776354
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < tmp_4.2000.in > tmp_4.2000.out

Start : Tue Mar 13 23:21:13 CET 2012
Finish: Tue Mar 13 23:21:14 CET 2012
Elapsed time:0:00:01

4.2000      -22.47538891
Results stored in: E_vs_a.txt

Where the “E_vs_a.txt” contains the … E vs a values!:
sururi@bebop:~/shared/qe/diamond_tutorial$ cat E_vs_a.txt
3.2000      -22.55859979
3.4000      -22.67717405
3.6000      -22.69824702
3.8000      -22.65805473
4.0000      -22.57776354
4.2000      -22.47538891

You can easily plot it using GNUPlot or whatever you want. In my case, I prefer GNUPlot:
sururi@husniya:~/shared/qe/diamond_tutorial$ plot E_vs_a.txt

giving me:

plot_E_vs_a

Other than the ‘plot’, I’ve also written another one, fit for data sets just like these where you have a parabol-like data and you’d like to try fitting it and here is what it does (automatically):

sururi@husniya:~/shared/qe/diamond_tutorial$ plotfitparabol.sh E_vs_a.txt
func    f(x)=0.674197*x*x-4.881275*x-13.855234
min, f(min)    3.620066 -22.690502

plot_E_vs_a_with_fit

Here are the source codes of the ‘plot’ and ‘plotfitparabol.sh’ scripts. Doei!

sururi@mois:/vala/bin$ cat plot
#!/bin/bash 
gnuplot -persist <<END 
set term postscript enhanced color
set output "$1.ps"
plot "$1" $2 $3

set term wxt
replot
END
sururi@mois:/vala/bin$ cat plotfitparabol.sh 
#!/bin/bash 

# Usage : plotfitparabol.sh <datafile>

# Reads the data points from the given file
# then uses Octave's "polyfit" function to calculate the coefficients 
# for the quadratic fit and also the minimum (maximum) value.
# Afterwards, prepares a GNUPlot script to display:
#    the data set (scatter)
#    fit function
#    the extremum point's value
#
# Written with Quantum Espresso optimizing procedures in mind.

 
if [ $# -ne 1 ]
then
    echo "Usage: plotfitparabol.sh <datafile>"
    echo "Example: plotfitparabol.sh E_vs_A.txt"
    echo -e "\nNeeds octave to be accessible via \"octave\" command to work"
    echo -e "\nEmre S. Tasci <emre.tasci@ehu.es>, 03/2012"
    exit
fi

octave_script="data=load(\"$1\");m=polyfit(data(:,1),data(:,2),2);min=-m(2)*0.5/m(1);fmin=min*min*m(1)+min*m(2)+m(3);printf(\"f(x)=%f*x*x%+f*x%+f|%f\t%f\",m(1),m(2),m(3),min,fmin)"
oct=$(echo $octave_script|octave -q)
#echo $oct

func=${oct/|*/ }
minfmin=${oct/*|/ }
echo -e "func \t" $func
echo -e "min, f(min) \t" $minfmin

echo '$func
set term postscript enhanced color
set label "$1" at graph 0.03, graph 0.94
set output "$1.ps"
plot "$1" title "data", f(x) title "$func", "<echo '$minfmin'" title "$minfmin"

set term wxt
replot'

gnuplot -persist <<END 
$func
set term postscript enhanced color
set label "$1" at graph 0.03, graph 0.94
set output "$1.ps"
plot "$1" title "data", f(x) title "$func", "<echo '$minfmin'" title "$minfmin"

set term wxt
replot
END

In my previous postdoc position, DFT calculations were governing my research field but with my current position at the Bilbao Crystallographic Server, we are more on the theoretical side of things then calculations. Nevertheless, recently, my old ‘flame’ have been kindled and I decided to scrape the rust.

As a change, I selected and installed the Quantum Espresso (QE) package instead of VASP that I was using in the past, main reason QE being open source.

As a newbie to QE and long-gone returner to DFT, I started to practice by mostly ye olde trial/error approach.

The first thing that surprised me the lack of a commenting option in regard to QE input files! (Actually, I’m almost sure that it surely is supported but I have yet to find!). Here’s my sample input file:

&control
    title='Diamond Relaxing',

    # Calculation Type ==========
    calculation='scf'
    #calculation='relax',
    #calculation='vc-relax'
    #calculation='cp'

    # Restart ===================
    restart_mode='from_scratch',
    #restart_mode='restart' 

    # Read/Write units ==========
    #ndr = 51,
    #ndw = 51,

    #nstep = 100,
    #nstep = 1000,
    nstep = 2000,

    # max_seconds = 600 # jobs stops after max_seconds CPU time.

    iprint = 10, # band energies are written every iprint iterations
    #tstress = .TRUE., # calculate stress. It is set to .TRUE. automatically if
                       # calculation='vc-md' or 'vc-relax'
    #tprnfor = .TRUE., # print forces. Set to .TRUE. if calculation='relax','md','vc-md'
    #dt = 5.0d0, # time step for molecular dynamics, in Rydberg atomic units
                 # (1 a.u.=4.8378 * 10^-17 s : beware, the CP code use
                 #  Hartree atomic units, half that much!!!)

    prefix='diamond_rel', # prepended to input/output filenames
    pseudo_dir = '/usr/share/espresso-4.3.2/pseudo/',
    outdir='/home/sururi/shared/tmp/'
    etot_conv_thr = 1.d-6 # convergence threshold on total energy (a.u) for ionic minimization
    forc_conv_thr = 1.d-3 # convergence threshold on forces (a.u) for ionic minimization
 /
 &system
##  ibrav
##    0          "free", see above                 not used
##    1          cubic P (sc)                      not used
##    2          cubic F (fcc)                     not used
##    3          cubic I (bcc)                     not used
##    4          Hexagonal and Trigonal P        celldm(3)=c/a
##    5          Trigonal R, 3fold axis c        celldm(4)=cos(alpha)
##   -5          Trigonal R, 3fold axis &lt;111&gt;    celldm(4)=cos(alpha)
##    6          Tetragonal P (st)               celldm(3)=c/a
##    7          Tetragonal I (bct)              celldm(3)=c/a
##    8          Orthorhombic P                  celldm(2)=b/a,celldm(3)=c/a
##    9          Orthorhombic base-centered(bco) celldm(2)=b/a,celldm(3)=c/a
##   10          Orthorhombic face-centered      celldm(2)=b/a,celldm(3)=c/a
##   11          Orthorhombic body-centered      celldm(2)=b/a,celldm(3)=c/a
##   12          Monoclinic P, unique axis c     celldm(2)=b/a,celldm(3)=c/a,
##                                               celldm(4)=cos(ab)
##  -12          Monoclinic P, unique axis b     celldm(2)=b/a,celldm(3)=c/a,
##                                               celldm(5)=cos(ac)
##   13          Monoclinic base-centered        celldm(2)=b/a,celldm(3)=c/a,
##                                               celldm(4)=cos(ab)
##   14          Triclinic                       celldm(2)= b/a,
##                                               celldm(3)= c/a,
##                                               celldm(4)= cos(bc),
##                                               celldm(5)= cos(ac),
##                                               celldm(6)= cos(ab
    ibrav= 2,

# Cell dimensions ========================
    # a,b,c in Angstrom =======
    A = 3.56712
    B = 2.49
    C = 2.49
    cosAB=0.5
    cosAC=0.5
    cosBC=0.5

    ## OR ##
    ## celldm(1:6) in Bohr =====
    ## 1 Bohr =  0.529177249 A
    ## 1 A = 1.889725989 Bohr
    # celldm(1) = 6.7409 # alat
    # celldm(2) = 4.7054 # b/a
    # celldm(3) = 4.7054 # c/a
    # celldm(4) = 0.5 # cos(ab)
    # celldm(5) = 0.5 # cos(ac)
    # celldm(6) = 0.5 # cos(ab)

   nat=2,
   ntyp= 1,

   ecutwfc = 40, # kinetic energy cutoff (Ry) for wavefunctions
   # ecutrho = 160 # kinetic energy cutoff (Ry) for charge density and potential.
                   # For norm-conserving pseudopotential you should stick to the
                   # default value
   # nbnd = 6 # number of electronic states (bands) to be calculated.
              # Default:
              # for an insulator, nbnd = number of valence bands (nbnd = # of electrons /2);
              # for a metal, 20% more (minimum 4 more)
 /
 &electrons
   emass = 400.d0
   emass_cutoff = 2.5d0

   #electron_dynamics = 'sd' # CP specific
   #electron_dynamics = 'bfgs' # CP specific

  conv_thr = 1D-6 # Convergence threshold for selfconsistency

 /
 &ions
    #ion_dynamics = 'none'
    #ion_dynamics = 'bfgs'
    #ion_dynamics = 'sd'

    # === damping === 0 ==
    # ion_dynamics = 'damp'
    # ion_damping = 0.2
    # ion_velocities = 'zero' # CP specific -- initial ionic velocities
    # === damping === 1 ==

    #ion_nstepe = 10 # CP specific -- number of electronic steps per ionic step. (Def: 1)
 /
 &cell
    cell_dynamics = 'none',
    #cell_dynamics = 'bfgs', # ion_dynamics must be 'bfgs' too
    #cell_dofree = 'xyz'
    #cell_factor = 3
    press = 0.0d,
    # press_conv_thr = 0.05 # Convergence threshold on the pressure for variable cell relaxation
 /
ATOMIC_SPECIES
 C  12.0107 C.pz-vbc.UPF
ATOMIC_POSITIONS
 C 0.0 0.0 0.0
 C 0.25 0.25 0.25
K_POINTS automatic
   4 4 4 1 1 1

so, it’s not the tidiest parameter file around but nevertheless, I’ve included some explanations and it’s easily customizable by simple commenting out, right?…

And then the problem of (yet undiscovered) comments in the QE input files hits us. But with bash, it’s pretty easy. So, if you save the above contents to a file called, say, “diamond.scf.in” and filter it via grep and sed, you’re done:

sururi@husniya:~/shared/qe/diamond_tutorial$ cat diamond.scf.in | grep -v -E "^[\t ]*#" | sed "s:\([^#]*\)#.*:\1:"
&control
    title='Diamond Relaxing',

    calculation='scf'

    restart_mode='from_scratch',

    nstep = 2000,

    iprint = 10, 

    prefix='diamond_rel',
    pseudo_dir = '/usr/share/espresso-4.3.2/pseudo/',
    outdir='/home/sururi/shared/tmp/'
    etot_conv_thr = 1.d-6
    forc_conv_thr = 1.d-3
 /
 &system
    ibrav= 2,

    A = 3.56712
    B = 2.49
    C = 2.49
    cosAB=0.5
    cosAC=0.5
    cosBC=0.5

   nat=2,
   ntyp= 1,

   ecutwfc = 40,
 /
 &electrons
   emass = 400.d0
   emass_cutoff = 2.5d0

  conv_thr = 1D-6 

 /
 &ions

 /
 &cell
    cell_dynamics = 'none',
    press = 0.0d,
 /
ATOMIC_SPECIES
 C  12.0107 C.pz-vbc.UPF
ATOMIC_POSITIONS
 C 0.0 0.0 0.0
 C 0.25 0.25 0.25
K_POINTS automatic
   4 4 4 1 1 1

I’ve installed the QE to take advantage of the cluster nodes via MPI, so I submit my jobs via:
mpirun -n #_of_processes /usr/share/espresso-4.3.2/bin/pw.x < input_file > output_file

which I guess is more or less similar to what you are doing to run your jobs.

So, combining this “comment filtering” and job submission, I’ve written the following bash script (“mpi-pw.x“) to automatize (and keep log of) the job:

sururi@cowboy:~$ cat /home/sururi/bin/mpi-pw.x 
#!/bin/bash

# Script to submit Quantum Espresso jobs via mpi
# EST, Fri Mar  9 17:02:10 CET 2012

function timer()
{
    if [[ $# -eq 0 ]]; then
        echo $(date '+%s')
    else
        local  stime=$1
        etime=$(date '+%s')

        if [[ -z "$stime" ]]; then stime=$etime; fi

        dt=$((etime - stime))
        ds=$((dt % 60))
        dm=$(((dt / 60) % 60))
        dh=$((dt / 3600))
        printf '%d:%02d:%02d' $dh $dm $ds
    fi
}

if [ $# -ne 2 ]
then
    echo "Usage: mpi-pw.x num_of_processes infile.in"
    exit
fi

job=$2
job_wo_ext=${job%.*}

outext=".out"
logext=".log"
echo -e "Running: mpirun -n $1 /usr/share/espresso-4.3.2/bin/pw.x < $2 > $job_wo_ext$outext\n"
echo -e "Running: mpirun -n $1 /usr/share/espresso-4.3.2/bin/pw.x < $2 > $job_wo_ext$outext\n">$job_wo_ext$logext
echo "Start : "$(date)
echo "Start : "$(date)>>$job_wo_ext$logext

# Take out the comments from the input file
#grep -v -E "^[\t ]*#" $2 > in_wo_comments.in.tmp
grep -v -E "^[\t ]*#" $2 | sed "s:\([^#]*\)#.*:\1:" > in_wo_comments.in.tmp

t=$(timer)
mpirun -n $1 /usr/share/espresso-4.3.2/bin/pw.x < in_wo_comments.in.tmp > $job_wo_ext$outext
echo "Finish: "$(date)
echo "Finish: "$(date)>>$job_wo_ext$logext
printf 'Elapsed time:%s\n\n' $(timer $t)
printf 'Elapsed time:%s\n\n' $(timer $t) >>$job_wo_ext$logext

rm -f in_wo_comments.in.tmp

(I’ve picked the “timer” function from Mitch Frazier’s entry in LinuxJournal)

What it does is, really simple: it strips out the extension of the input file ($2 – the second argument in calling); filters out the comments, constructing the temporary input file “in_wo_comments.in.tmp”; starts the timer; submits the job using the number of processes passed in the calling ($1 – first argument) and directs the output to a file whose name is the extension stripped input filename + “.out”; when the job is (this way or that) done, stops the timer and calculates the elapsed time via the ‘timer’ function. During all this time, it prints these information both on screen and to a file called extension stripped input filename + “.log”.

I have a very similar another script called “mpi-cp.x” that I’ve forked for Carr-Parinelli jobs (just search & replace all the occurrences of “mpi-pw.x” with “mpi-cp.x” and you’re done 8).

And here is the script in action:

sururi@cowboy:~/shared/qe/diamond_tutorial$ ll
total 16
drwxr-xr-x 2 sururi sururi 4096 2012-03-13 15:59 ./
drwxr-xr-x 7 sururi sururi 4096 2012-03-13 15:10 ../
-rw-r--r-- 1 sururi sururi 4954 2012-03-13 15:59 diamond.scf.in

sururi@cowboy:~/shared/qe/diamond_tutorial$ mpi-pw.x 4 diamond.scf.in 
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < diamond.scf.in > diamond.scf.out

Start : Tue Mar 13 16:18:20 CET 2012
Finish: Tue Mar 13 16:18:21 CET 2012
Elapsed time:0:00:01

sururi@cowboy:~/shared/qe/diamond_tutorial$ grep -e ! diamond.scf.out 
!    total energy              =     -22.70063050 Ry

sururi@cowboy:~/shared/qe/diamond_tutorial$ cat diamond.scf.log 
Running: mpirun -n 4 /usr/share/espresso-4.3.2/bin/pw.x < diamond.scf.in > diamond.scf.out

Start : Tue Mar 13 16:18:20 CET 2012
Finish: Tue Mar 13 16:18:21 CET 2012
Elapsed time:0:00:01

Now that we have the mpi-pw.x and mpi-cp.x, our next entry in the series will be the automated search for optimized values.