Code Quality

Aspects of good quality code

  • Readable:

    A human being can easily understand the purpose of the code and can maintain the code.

  • Reusable:

    Code is written in such a way that it can be reused across multiple contexts with little or no modification required.

  • Robust:

    A computer system is able to cope with errors during execution.

Source: xkcd

Code readability

Code is read more often than it is written. — Guido van Rossum (creator of Python)

  • At some point someone else or our future us will need to understand the code we write today.

Variable names

  • Use informative names for functions and variables

Meaningless variable names:

import pandas as pd
x = "Alex"
y = 42
z = pd.DataFrame()
my_favourite_number = "ssh, I'm a string"
import pandas as pd

# Defining variables
first_name = "Alex"
number_of_attendees = 42
empty_dataframe = pd.DataFrame()


# Using variables
print("Hi " + first_name)
number_of_attendees += 1

Function names

  • Name functions after the task they perform.
  • Use verb-based function names
process_text <- function(data) {
    ...
}

processed_text <- process_text("The following document was handled using...")
  • Name function in form of a question if return value is boolean
are_missing_values_present <- function(data) {
  if (NA %in% data) {
      TRUE
  } else {
      FALSE
  }
}

Data pipeline examples

Which one is more readable to you?

  • Short version
data_path <- "path/to/data"
report_data <- generate_report(model(clean(load(data_path))))
  • More explicit version
data <- load(data_path)
clean_data <- clean(data)
model_results <- model(clean_data)
report_data <- generate_report(model_results)

Style matters (Python)

  • Is the code using everything that language ‘X’ has to offer?
# Example 1 - very unpythonic
i = 0
my_data = []
while i < 100:
  my_data += [i * i / 356]
  i += 1

# Example 2 - more use of Python features, such as `range` and `append`
my_data = []
for i in range(100):
  my_data.append(i**2 / 356)

Python at its best:

# Example 3 - making full use pythonic idioms, `range` with list comprehension
my_data = [i**2 / 356 for i in range(100)]

Style matters (R)

# Example 1 - not idiomatic
i = 0
my_data = c()
while (i < 100) {
  my_data = c(my_data, i * i / 356)
  i = i + 1
}

# Example 2 - more use of R features, e.g. `append` and idiomatic assignment (' <- ')
my_data = c()
for (i in 0:100) {
  my_data <- append(my_data, i^2 / 365)
}

R at its best:

# Example 3 - making  use of R's built-in vectors
my_data <- (0:100) ^ 2 / 365

Consistency and Style Guides

  • Use a consistent style
    • Your language of choice will impact how you separate words e.g CamelCase or snake_case
    • Consistency will make your code easier to understand and maintain
    • Consult a styleguide for your language

Source: xkcd

Linters and Formatters

  • Linters
    • Analyse your code to flag stylistic errors
    • Find small bugs
    • Identify security issues
  • Formatters
    • Detect when you have diverged from a style
    • Automatically correct the formatting of your code
Language Linters Formatters
R lintr formatR, styler
Python ruff, pylint ruff, black

Linters: R

Function: lintr::lint(filename)

The lintr package in R:

Linters: Python

The ruff library in Python:

Function: ruff check --fix path/to/code/to/check.py

 ruff check --fix src-qt5/main.py

src-qt5/main.py:1522:9: E722 Do not use bare `except`
Found 16 errors (1 fixed, 1 remaining).

(Auto)formatters

  • R: The styler package The styler package in R:

    Function: styler::style_file(filepath)

  • Python: We will use ruff again: Function: ruff format path/to/code/to/check.py

Summary

  • Use descriptive variable names
  • Name functions after the task they perform
  • Readable multiline code is often preferred over condense one-liners
  • Use language specific conventions (style guides)
  • Use linters and formatters to improve your code

Your turn

  • Run a linter through your code and identify style issues:

  • Edit your code to improve the style compatibility, based on the feedback from your linter.

  • Run an autoformatter through your code to automatically fix issues instead of simply flagging them:

  • If you find code that is hard to read, or variable names that need adjusting, make a note to work on it. Use #TODO or another consistent label so you can extract these notes later.