Reusable code: Code that can be used again in the same project or in other future projects.
Code needs to be modular.
Benefit: Less code to write means less code to update and lower chance of human errors
Separate code and data: data is specific, code need not be
gravity = 9.80665, once.Do One Thing (and do it well)
Don’t Repeat Yourself: use functions
Write routines in functions, i.e., code you reuse often
Identify potential functions by action: functions perform tasks (e.g. sorting, plotting, saving a file, transform data…)
Your project should be transportable between computers.
For this reason, you should use relative paths only: compare
/Users/barbara/Dropbox/proteindomains/data/zincfinger.json./data/zincfinger.json./ means: in this folder
../ means: one folder up
Functions are smaller code units responsible for one task.
Functions are meant to be reused
Functions accept arguments (though they may also be empty!)
What arguments a function accept is defined by its parameters
Functions do not necessarily make code shorter (at first)!
Small, cohesive units are much better than…
… a customized behemoth!
script.py
script.R
main <- function(filename, output_dir) {
df <- read_input(filename)
df <- preprocess(df)
data <- prepare_data_ml(df)
result <- train_model(
data$x_train,
data$y_train,
data$x_test,
data$y_test
)
save_result(result, output_dir)
}
# ---- command line interface ----
args <- commandArgs(trailingOnly = TRUE)
if (length(args) < 2) {
stop("Usage: Rscript script.R <filename> <output_dir>")
}
filename <- args[1]
output_dir <- args[2]
main(filename, output_dir)Choose:
The objective is for you to ‘see’ your code!
Yellow denotes scripted, unstructured code (basic, sequential lines of instructions)
Purple denotes functions or other structured code (e.g. for-loops, conditionals, etc.)
Green denotes comments (or comment blocks) (consider combining this with yellow for heavily commented code)
Again, make notes in your code (#TODO!) if you see:
What can you learn from your colleagues today?
You have visualized your code. Use your findings to improve it!
Preferably: take scripted code and turn it into a function, or split an existing function into two or more functions.
If there is no function to work on: try and address the readability of your code.
However: for future exercises you will need at least one function, preferably with parameters, in your code! For example:
Workshop Computational Reproducibility