Codebooks

What

A codebook (or data dictionary) is a type of data-level metadata. A good codebook is both human-readable and machine-readable.

Why

The purpose of a codebook is to explain what all the variable names and values in your dataset really mean, making the data understandable and reusable. A codebook is valuable both for researchers within the project and for collaborators and/or re-users outside the project.

Who

The researcher(s) working with the data are responsible for creating and maintaining the codebook.

When

The codebook should be created during the active stage of the project as data is processed. It should be finalized by the archiving and publication stages.

Where

The codebook should be available alongside your data. This would be within your project folder during the active stage and in your data package at the archiving and publication stages.

How

Information to include in a codebook typically includes:

  • Variable names
  • Readable variable name
  • Measurement units
  • Allowed values
  • Definition of the variable
  • Synonyms for the variable name (optional)
  • Description of the variable (optional)
  • Other relevant resources

For more guidance, see: How to Make a Data Dictionary

codebook R package

The codebook R package can generate both machine-readable (csv, xlsx) and human-readable codebooks (pdf) based on a given dataframe. A very simple example is given below:

# load libraries

library(codebook)
library(writexl)

# load data

data <- data.frame(iris)
  
# generate codebook

codebook <- codebook_table(data)

# export codebook

write_xlsx(codebook, "codebook.xlsx")