Parallel Programming

Author

Luka van de Wiel

Published

April 4, 2024

Note
  • Date: Thursday, 4th April 2024
  • Time: 15:00-17:00
  • Location: Bucheliuszaal 6.18, University Library (Utrecht Science Park)
  • Presentation: Parallel programming is not easy
  • Presenter: Lukas van de Wiel

Speed up your HPC workflow!

Do you want to perform many calculations or large analyses that would take very long on your own laptop? High Performance Computing (HPC) platforms provide compute power that might help to speed up your calculations or to increase your problem size (e.g. simulations at finer resolution, or exploring more parameter combinations). In this Programming Cafe Lukas van de Wiel, scientific programmer in Earth Sciences, will introduce us in the concepts of writing parallel code for HPC. As always we will prepare some hands on exercises so you can practice writing some parallel code yourself!

Slides

Exercises

There are many ways to speed up your code in Python, and also many ways to parallelize your code. The Parallel Programming in Python course that is co-developed and taught by the eScience Center is a great way to learn more about this in more detail. Below is a notebook that will introduce you to the multiprocessing library in Python, and a link to some the course materials of the above course about dask.

1. multiprocessing library

Download the notebook from the notebooks (link missing) folder and follow the instructions in the notebook.

2. dask and profilers

Checkout this cool tutorial on dask and measuring performance from the Parallel Programming in Python course (by the eScience Center).

Optional:

Browse the Parallel Programming in Python course to see what other topics might be interesting for you, and feel free to try out some of the code examples.

There are different approaches to parallel programming in R. One package that can be used is the doFuture package in combination with the foreach package.

install.packages("doFuture")
install.packages('foreach')

Follow this tutorial on how to use this package and stop at the section Writing code that runs on multiple nodes.

Try now to use the same method for doing some computations in parallel. For example:

estimate_pi <- function(N) {
    j <- 0
    for (i in 1:N) {
        x <- runif(1,-1,1)
        y <- runif(1,-1,1)
        if (x^2+y^2< 1)  {
            j=j+1
        }
    }
return(4*j/N)
}

Tip: There are two approaches: either parallelize the function, or run the function in parallel with batched inputs. Try both, think on forehand what would be the more useful approach and why and compare the results.

References