Welcome to the Programming Cafe!

Plan for today

Welcome 15 min
Complex project management use case 15 min
Discussion 10 min
Work on your own code 30 min
Basics of project management 10 min
HDF5 10 min
Work on your own code 30 min
Drinks!

Programming Cafe

  • Previously the R-Cafe
  • An informal, community event
  • Work on your own code
  • Themes with presentations and exercises
  • Interaction

WANTED: Topics, presenters, likes!

Check: https://github.com/UtrechtUniversity/programming-cafe

and submit your ideas and 👍 in the Issues section 🙏

Basics of reproducible project management

Contents

  • Introduction
  • Project design
  • Code organization
  • Data storage and organization
  • Next steps

Introduction

Scientific Project:

  • Data
  • Scripts
  • Compute platforms
  • Collaborators

Introduction

Keep things clean and organized for:

  • efficiency
  • transparency
  • reproducibility

Introduction

Already a challenge when working alone on 1 laptop!!!

Project design

.
├── CITATION.cff
├── LICENSE.md
├── README.md
├── requirements.txt
├── config             <- Configuration files (HW)
├── data               <- All project data, ignored by git
│   ├── processed      <- The final, canonical data sets for modeling. (PG)
│   ├── raw            <- The original, immutable data dump. (RO)
│   └── temp           <- Intermediate data that has been transformed. (PG)
├── docs               <- Documentation notebook for users (HW)
│   ├── manuscript     <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW)
│   └── reports        <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW)
├── results
│   ├── figures        <- Figures for the manuscript or reports (PG)
│   └── output         <- Other output for the manuscript or reports (PG)
└── src                <- Source code for this project (HW)

Project design

Document!!

  • Project structure
  • Collaboration
  • Pipeline

Organize code

  • Code quality and best practices

  • Store it online (Git)

Organize code

Git for Version Control

Data storage and organization

Store it online!

  • Yoda
  • Research Drive
  • Surfdrive
  • Onedrive
  • etc.

Project design

.
├── CITATION.cff
├── LICENSE.md
├── README.md
├── requirements.txt
├── config             <- Configuration files (HW)
├── data               <- All project data, ignored by git
│   ├── processed      <- The final, canonical data sets for modeling. (PG)
│   ├── raw            <- The original, immutable data dump. (RO)
│   └── temp           <- Intermediate data that has been transformed. (PG)
├── docs               <- Documentation notebook for users (HW)
│   ├── manuscript     <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW)
│   └── reports        <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW)
├── results
│   ├── figures        <- Figures for the manuscript or reports (PG)
│   └── output         <- Other output for the manuscript or reports (PG)
└── src                <- Source code for this project (HW)

Data storage and organization

Code and Data organization

#!/bin/bash
#Set job requirements
#SBATCH -n 16
#SBATCH -t 5:00:00
 
#Clone project
git clone https://github.com/UtrechtUniversity/my-project.git

#Download data from Yoda
mkdir "$TMPDIR"/input
irsync -rKV i:myfolder "$TMPDIR"/input
 
#Execute tasks

...

Advanced/future topics

  • Makefiles
  • Workflow management tools
  • Continuous integration and testing
  • Containers
  • APIs

Resources: