Setting up a project

Research compendium

A research compendium is a collection of all digital parts of a research project including data, code, texts (…). The collection is created in such a way that reproducing all results is straightforward.

Source: The Turing Way

Getting started

Contain your project in a single recognizable folder
Distinguish folder types, name them accordingly:
- Read-only: data, metadata
- Human-generated: code, paper, documentation
- Project-generated: clean data, figures, models…
Initialize a README file, document your project
Choose a license
Publish your project.

Wilson et al. (2017)

Simple Project Templates

You can set up a project template using one of the following template repositories of a project folder structure to your computer:

Simple click through Use this template -> Create a repository and enter the necessary information for your new repository such as the name and owner.

A Good Enough Project

.
├── .gitignore
├── CITATION.md
├── LICENSE.md
├── README.md
├── requirements.txt
├── bin                <- Compiled and external code, ignored by git (PG)
│   └── external       <- Any external source code, ignored by git (RO)
├── config             <- Configuration files (HW)
├── data               <- All project data, ignored by git
│   ├── processed      <- The final, canonical data sets for modeling. (PG)
│   ├── raw            <- The original, immutable data dump. (RO)
│   └── temp           <- Intermediate data that has been transformed. (PG)
├── docs               <- Documentation notebook for users (HW)
│   ├── manuscript     <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW)
│   └── reports        <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW)
├── results
│   ├── figures        <- Figures for the manuscript or reports (PG)
│   └── output         <- Other output for the manuscript or reports (PG)
└── src                <- Source code for this project (HW)

Absolute vs. Relative Paths

Your project should be transportable between computers.
For this reason, you should use relative paths only: compare
- /Users/barbara/Dropbox/proteindomains/data/zincfinger.json
- ./data/zincfinger.json
./ means: in this folder
../ means: one folder up

Your turn

Clone one of the following template repositories of a project folder structure to your computer:
- R: https://github.com/UtrechtUniversity/simple-r-project
- Python: https://github.com/UtrechtUniversity/simple-python-template/
Place your project files in the right folder.
Adjust paths in your code, and be sure to use relative (see next slide) paths!
It is fine to have the main script (e.g. main.py) in the home folder!

Choosing a license

Copyright is implicit; others cannot use your code without your permission.
Licensing gives that permission, and its boundaries and conditions.
Choosing a license early on means being aware of your license as the project proceeds (and not creating conflicts).
There are over 80 OSI-approved licenses (and many, many others) to choose from.

This is one I like to use:

What is important to you? What does your lab use? Choose your own license!

Publishing your project

Uh… Isn’t ‘publication’ the thing you do… at the end?

No! Publishing your project at an early stage - forces you to consider readability throughout - minimizes the mess you have to deal with when you (finally) decide to publish - allows collaboration and support - facilitates sharing and re-use.

But what if someone scoops my code! I’m a revolutionary, they will steal my ideas!

If you are super paranoid, you can always opt for a private repository. It is your work & up to you. But consider the advantages!

Publishing unpublished data

If you have sensitive data…
- Don’t include your data in your software repository (that’s not what they are for anyway).
- Consider generating simulated data so your code can run regardless.
And for all data:
- Your data should be separate from your code!
- If your code references your data, consider a config or metadata file for these references.

Where do I publish?

Living project: github

(or other social coding platform):

synergistic with version control software git
makes history public and accessible (eek!)
allows publication of different releases
provides a platform for interaction and collaboration

Archiving a release: zenodo

(or other stable repository, like the OSF) - direct archiving supported from github to zenodo

this gives you a doi (digital object identifier): your code is citeable!

octocat-project zenodo