Setting up a project

Research compendium

A research compendium is a collection of all digital parts of a research project including data, code, texts (…). The collection is created in such a way that reproducing all results is straightforward.

Source: The Turing Way

Getting started

  • Contain your project in a single recognizable folder
  • Add subdirectories based on file type:
    • data (Read Only)
    • src / scripts / R (Human Writeable)
    • output (Project Generated)
  • Initialize:
    • README.md
    • LICENSE.md
    • CITATION.cff
  • Initialize version control

A Good Enough Project (R)

.
├── .gitignore
├── CITATION.cff
├── LICENSE.md
├── README.md
├── data               <- All project data, ignored by git
   ├── processed      <- Final, canonical data sets (PG)
   ├── raw            <- Original, immutable data dump (RO)
   └── temp           <- Intermediate transformed data (PG)
├── docs               <- Documentation for users (HW)
   ├── manuscript     <- Manuscript source (HW)
   └── reports        <- Reports, notebooks (HW)
├── results
   ├── figures        <- Figures (PG)
   └── output         <- Other output (PG)
├── R                  <- Source code (HW)
└── MyProject.Rproj    <- R Project File (PG)

A Good Enough Project (Python)

.
├── .gitignore
├── CITATION.cff
├── LICENSE.md
├── README.md
├── requirements.txt
├── data               <- All project data, ignored by git
   ├── processed      <- Final, canonical data sets (PG)
   ├── raw            <- Original, immutable data dump (RO)
   └── temp           <- Intermediate transformed data (PG)
├── docs               <- Documentation for users (HW)
   ├── manuscript     <- Manuscript source (HW)
   └── reports        <- Reports, notebooks (HW)
├── results
   ├── figures        <- Figures (PG)
   └── output         <- Other output (PG)
└── src                <- Source code (HW)

Names & Naming Conventions

  • All files and folders should be named to reflect their content or function.
  • All names should adhere to the same convention
Naming Convention Example Description
original name an awesome name N/A
snake_case an_awesome_name lowercase, underscores
kebab-case an-awesome-name lowercase, hyphens
PascalCase AnAwesomeName capitalized words
camelCase anAwesomeName first word lowercase

Absolute vs. Relative Paths

Absolute paths
- Full location from filesystem root
- Not portable
- Break when moving or sharing a project

Relative paths
- Location relative to working directory
- Portable and reproducible
- IDEs help maintain correct working directories

Example File Structure

my_project/
│
├ data/
│   file.csv
│
└ script.py

Python Example

import pandas as pd
df = pd.read_csv("data/file.csv")

R Example

data <- read.csv("data/file.csv")

Use RStudio Projects to ensure correct working directory.

IDEs help with Relative Paths

Language Beginner IDE Feature
Python Visual Studio Code workspace folder
R RStudio built‑in projects
MATLAB MATLAB current folder