A research compendium is a collection of all digital parts of a research project including data, code, texts (…). The collection is created in such a way that reproducing all results is straightforward.
Source: The Turing Way
(Artwork by Scriberia for The Turing Way, CC-BY)
Contain your project in a single recognizable folder
Distinguish folder types, name them accordingly:
Initialize a README file, document your project
Choose a license
Publish your project.
.
├── .gitignore
├── CITATION.md
├── LICENSE.md
├── README.md
├── requirements.txt
├── bin <- Compiled and external code, ignored by git (PG)
│ └── external <- Any external source code, ignored by git (RO)
├── config <- Configuration files (HW)
├── data <- All project data, ignored by git
│ ├── processed <- The final, canonical data sets for modeling. (PG)
│ ├── raw <- The original, immutable data dump. (RO)
│ └── temp <- Intermediate data that has been transformed. (PG)
├── docs <- Documentation notebook for users (HW)
│ ├── manuscript <- Manuscript source, e.g., LaTeX, Markdown, etc. (HW)
│ └── reports <- Other project reports and notebooks (e.g. Jupyter, .Rmd) (HW)
├── results
│ ├── figures <- Figures for the manuscript or reports (PG)
│ └── output <- Other output for the manuscript or reports (PG)
└── src <- Source code for this project (HW)
Copyright is implicit; others cannot use your code without your permission.
Licensing gives that permission, and its boundaries and conditions.
Choosing a license early on means being aware of your license as the project proceeds (and not creating conflicts).
There are over 80 OSI-approved licenses (and many, many others) to choose from.
We will dive into licenses in the Software Publication chapter.
When creating a GitHub repository for your code you need to decide to make it publicly accessible or to keep it private.
Publishing your project at an early stage - Consider readability throughout - Get feedback during development from your community - May generate collaborations - Makes it easier to create a Publication or a software package
–> Open Science
But what if someone scoops my code! I’m a revolutionary, they will steal my ideas!
You can always opt for a private repository.
How to include large, sensitive data or unpublished data?
Don’t include your data in your software repository.
Include simulated data or a small example dataset to test your code.
Research Data and full datasets:
Workshop Computational Reproducibility