Welcome to the Programming Cafe!

Plan for today

Welcome	15 min
Complex project management use case	15 min
Discussion	10 min
Work on your own code	30 min
Basics of project management	10 min
HDF5	10 min
Work on your own code	30 min
Drinks!

Programming Cafe

Previously the R-Cafe
An informal, community event
Work on your own code
Themes with presentations and exercises
Interaction

WANTED: Topics, presenters, likes!

Check: https://github.com/UtrechtUniversity/programming-cafe

and submit your ideas and 👍 in the Issues section 🙏

Basics of reproducible project management

Introduction
Project design
Code organization
Data storage and organization
Next steps

Introduction

Scientific Project:

Data
Scripts
Compute platforms
Collaborators

Introduction

Keep things clean and organized for:

efficiency
transparency
reproducibility

Introduction

Already a challenge when working alone on 1 laptop!!!

Project design

.
├── CITATION.cff
├── LICENSE.md
├── README.md
├── requirements.txt
├── config             <- Configuration files (HW)
├── data               <- All project data, ignored by git
│   ├── processed      <- The final, canonical data sets for modeling. (PG)
│   ├── raw            <- The original, immutable data dump. (RO)
│   └── temp           <- Intermediate data that has been transformed. (PG)
├── docs               <- Documentation notebook for users (HW)
│   ├── manuscript     <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW)
│   └── reports        <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW)
├── results
│   ├── figures        <- Figures for the manuscript or reports (PG)
│   └── output         <- Other output for the manuscript or reports (PG)
└── src                <- Source code for this project (HW)

Project design

Document!!

Project structure
Collaboration
Pipeline

Organize code

Code quality and best practices
Store it online (Git)

Organize code

Git for Version Control

Data storage and organization

Store it online!

Yoda
Research Drive
Surfdrive
Onedrive
etc.

Project design

.
├── CITATION.cff
├── LICENSE.md
├── README.md
├── requirements.txt
├── config             <- Configuration files (HW)
├── data               <- All project data, ignored by git
│   ├── processed      <- The final, canonical data sets for modeling. (PG)
│   ├── raw            <- The original, immutable data dump. (RO)
│   └── temp           <- Intermediate data that has been transformed. (PG)
├── docs               <- Documentation notebook for users (HW)
│   ├── manuscript     <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW)
│   └── reports        <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW)
├── results
│   ├── figures        <- Figures for the manuscript or reports (PG)
│   └── output         <- Other output for the manuscript or reports (PG)
└── src                <- Source code for this project (HW)

Data storage and organization

Code and Data organization

#!/bin/bash
#Set job requirements
#SBATCH -n 16
#SBATCH -t 5:00:00
 
#Clone project
git clone https://github.com/UtrechtUniversity/my-project.git

#Download data from Yoda
mkdir "$TMPDIR"/input
irsync -rKV i:myfolder "$TMPDIR"/input
 
#Execute tasks

...

Advanced/future topics

Makefiles
Workflow management tools
Continuous integration and testing
Containers
APIs

Welcome to the Programming Cafe!

Plan for today

Programming Cafe

WANTED: Topics, presenters, likes!

Basics of reproducible project management

Contents

Introduction

Introduction

Introduction

Project design

Project design

Organize code

Organize code

Data storage and organization

Project design

Data storage and organization

Code and Data organization

Advanced/future topics

Resources: