Running for Loops in Parallel¶

In this file, we will look at how to use the multiprocessing package to run a for loop in parallel and speed up a parmeter sweep.

First, run the below to import the modules needed for this workbook. Then, run the following cell to perform a sequential parameter sweep over the electronic coupling (t0) parameter. The code will print out how long this takes so that we have a benchmark to compare with the time taken for the parallelised parameter sweep.

In [1]:

Copied!

import warnings
from datetime import datetime

from lattice_hamiltonian.run_lattice import single_parameter, sweep_parameter

warnings.filterwarnings("ignore", category=RuntimeWarning)
import warnings
from datetime import datetime

from lattice_hamiltonian.run_lattice import single_parameter, sweep_parameter

warnings.filterwarnings("ignore", category=RuntimeWarning)

In [2]:

Copied!





# This should be a string - see the RunLattice.py file for all the possible parameters
parameter_to_vary = "t0"
# A list containing the values you want parameter_to_vary to take
parameter_array = [2e-3, 4e-3, 6e-3, 8e-3, 10e-3]
# A dictionary containing user specified input variables.
# Unspecified parameters will be set to default values.
parameter_dict = {"size": 5}

t_start = datetime.now()
lattice_dict = sweep_parameter(parameter_to_vary, parameter_array, parameter_dict)
t_end = datetime.now()
print(f"Time taken: {t_end - t_start}")
# This should be a string - see the RunLattice.py file for all the possible parameters
parameter_to_vary = "t0"
# A list containing the values you want parameter_to_vary to take
parameter_array = [2e-3, 4e-3, 6e-3, 8e-3, 10e-3]
# A dictionary containing user specified input variables.
# Unspecified parameters will be set to default values.
parameter_dict = {"size": 5}

t_start = datetime.now()
lattice_dict = sweep_parameter(parameter_to_vary, parameter_array, parameter_dict)
t_end = datetime.now()
print(f"Time taken: {t_end - t_start}")

Time taken: 0:00:27.578930

We will now perform the same parameter sweep in parallel using the multiprocessing package. First, we import the Pool class which is used to make a pool of workers to which we can assign the tasks to be run in parallel. The number of workers is set by the processes argument when you instantiate the class. This argument can be ommitted, in which case the default behaviour is to create as many workers as your computer has CPUs.

In [3]:

Copied!

from multiprocessing.pool import Pool
from multiprocessing.pool import Pool

To run the code in parallel, we will be using the pool.map function (see the documentation here). Essentially, this is just a parallel version of the regular Python map function (see here) in which different elements of the iterable are passed to different workers in the pool. This means that we need to make an iterable of the input parameters, which is what we do in the cell below.

In [4]:

Copied!





dict_list = []
for i, val in enumerate(parameter_array):
    parameter_dict[parameter_to_vary] = val
    dict_list.append(parameter_dict.copy())
dict_list = []
for i, val in enumerate(parameter_array):
    parameter_dict[parameter_to_vary] = val
    dict_list.append(parameter_dict.copy())

Now we run the parameter sweep in parallel. You can change the number of processes and see how this affects the time taken for the loop to complete. Note that, in this example, it is a bad idea to try and create more processes than your computer has cores as the code is limited by computation time, rather than I/O. In general, it is best to play around a bit with the number of processes you use to find out what is best for your particular situation.

In [5]:

Copied!





# The with statement runs the Pool object within a context manager.
# This automatically frees up the resources used by the pool once all
# processes have finished.
# More information about context managers:
# https://book.pythontips.com/en/latest/context_managers.html
with Pool(processes=3) as pool:
    t_start = datetime.now()
    # Assign the tasks to the workers in the pool
    result = pool.map(single_parameter, dict_list)
t_end = datetime.now()
print(f"Time taken: {t_end - t_start}")
# The with statement runs the Pool object within a context manager.
# This automatically frees up the resources used by the pool once all
# processes have finished.
# More information about context managers:
# https://book.pythontips.com/en/latest/context_managers.html
with Pool(processes=3) as pool:
    t_start = datetime.now()
    # Assign the tasks to the workers in the pool
    result = pool.map(single_parameter, dict_list)
t_end = datetime.now()
print(f"Time taken: {t_end - t_start}")

Time taken: 0:00:19.546347

For the 5x5 lattice, parallelisation is not a very effective way to speed up the code as the time it takes for a typical desktop computer to create the pool of workers negates the time saved by running the parameter sweep in parallel. However, for a larger lattice, the time taken to solve the Hamiltonian far exceeds the time required to make the pool and so parallelisation significantly reduces the total runtime.