Overview

Questions Objectives Key Concepts / Tools
What is clean code? Write code that is easy to read and maintain. Readability, clean code
How do good names improve code readability? Use clear, descriptive names for variables and functions. Naming conventions
How do I write idiomatic code? Apply language-specific best practices. Pythonic style, idioms
How do I keep code style consistent? Automate style checks and formatting. Linters, formatters
Why avoid repeating code? Reduce redundancy and improve maintainability. DRY principle, refactoring

Readability

When we write code, it’s important to remember that someone else — or even our future selves — will need to understand it later.
Code that is easy to read and understand is often called clean code. Clean code helps programmers easily understand what a program is doing by applying principles that make the code more organized, intuitive, and easier to work with over time. This not only helps to inspire trust when publishing clean code with your manuscript, it will also save you (a lot of) time when revisiting code e.g. when you need to run additional analysis or make changes.

Think about your own experience reusing code:

  • What do you expect from well-written code?
  • What helps you understand it quickly?
  • What makes it frustrating or confusing?

In this section, we’ll introduce key principles and tools that will help you write code that is both clean and maintainable.


The main aspects of clean code

  1. Readable
    The code clearly communicates its purpose by using meaningful and descriptive variable and function names; calculateTotal() is better than doIt(). You should be able to understand what the code does without relying heavily on comments.

  2. Simple and focused
    Each function does one thing and does it well.
    Avoid unnecessary complexity or “clever” shortcuts — clarity always wins over cleverness.

  3. DRY (Don’t Repeat Yourself)
    Avoid code duplication.
    Big functions or modules are hard to reuse. Do you have scripts/modules or functions that are doing (partly) similar things? Try to create reusable functions for the shared tasks!

  4. Well-structured
    Maintain consistent formatting and indentation.
    Organize files, classes, and functions logically.
    Use clear control flow — avoid “spaghetti code.”

  5. Tested
    Use automated tests (unit, integration, etc.) to verify correctness.
    Testing makes it easier to refactor or extend code safely.

  6. Self-documenting
    Structure and naming should make your code understandable without excessive comments.
    When comments are used, they should explain why, not what.

In this tutorial, we’ll explore several of these aspects in detail.


Meaningful names

A key part of writing readable code is using meaningful names for variables, constants, functions, classes, and other objects.
Names should be informative yet concise.

You may have encountered code where variable names are meaningless or misleading:

import pandas as pd
x = "Alex"
y = 42
z = pd.DataFrame()
my_favourite_number = "ssh, I'm a string"

It’s nearly impossible to infer what these variables represent if you use them later on. To make code easier to understand, avoid cryptic or single-letter identifiers.

Single-letter names can be acceptable in specific contexts — for example, when implementing mathematical formulas where x, y, or n have clear, conventional meanings. Even then, ensure the mathematical reference is clearly explained or cited, and maintain consistency throughout your code.

In most other cases, using descriptive names composed of a few informative words greatly improves readability. Your programming language’s conventions (e.g. snake_case in Python or CamelCase in Java) will guide how you format them.

import pandas as pd

# Defining variables
first_name = "Alex"
number_of_attendees = 42
empty_dataframe = pd.DataFrame()
# Using variables
print(f"Hi {first_name}")
number_of_attendees += 1

Ideally, variable names should make their purpose clear at a glance and make logical sense in the context of their use. This approach leads to self-documenting code — code that communicates intent without requiring additional comments.

Function names

Functions should be named after the task they perform. The reader should be able to infer what a function does simply by reading its name.

Good examples:

process_text <- function(data) {
    ...
}

processed_text <- process_text("The following document was handled using...")

If a function returns a Boolean value, phrasing it as a question can make its purpose even clearer:

are_missing_values_present <- function(data) {
  if (NA %in% data) {
      TRUE
  } else {
      FALSE
  }
}

Which version is more readable?

Short version:

data_path <- "path/to/data"
report_data <- generate_report(model(clean(load(data_path))))

More explicit version:

data <- load(data_path)
clean_data <- clean(data)
model_results <- model(clean_data)
report_data <- generate_report(model_results)

The explicit version is longer, but it’s much easier to follow and debug.

Conventions

Idiomatic code is code that is following language-specific best practices and conventions. By using conventions, you make it easier for others to navigate your code. Conventions are often described in a language-specific style guide (e.g. Python: PEP8, R: Tidyverse Style guide). It is good to be aware of style guides, and how to adhere to them in a practical way is described in the next section.

Idiomatic code also means, making good use of the strengths of a programming language. Consider this example:

# Example 1 – very unpythonic
i = 0
my_data = []
while i < 100:
    my_data += [i * i / 356]
    i += 1

# Example 2 – better use of Python features
my_data = []
for i in range(100):
    my_data.append(i**2 / 356)

And the most Pythonic version:

# Example 3 – idiomatic Python using list comprehension
my_data = [i**2 / 356 for i in range(100)]

The final version is concise, expressive, and easy to read, at least for Python programmers that have experience with list comprehensions. In the end the readability of the code depends on the person(s) reading it, so which conventions work best for you is up to you.

Automate style checks

Following a style guide from the start of a project is good practice. However, ensuring ongoing compliance manually can be tedious. Automated tools can help by detecting style issues as you code or by reformatting your code automatically.

See Linters and formatters for details on automating these checks.

Language Linters Formatters
R lintr formatR, styler
Python ruff, pylint ruff, black

Exercise

Linters

Run a linter on your code to identify style issues:

R: lintr
Python: ruff check

Edit your code to improve its style based on the linter’s feedback.

Formatters

Run an autoformatter to automatically fix formatting issues:

R: styler

Python: ruff format

If you encounter code that’s hard to read or variables that need better names, mark them for later improvement using # TODO or another consistent label.

Don’t repeat yourself

During early development, it’s common to create “quick and dirty” solutions with repeated code. Over time, duplication wastes effort and makes maintenance harder.

For example, imagine a script that repeats nearly identical logic three times. If you need to fix or modify that logic, you must update each copy — and missing one could introduce a bug.

Repeated code:

first_ten_numbers = list(range(1, 11))
second_ten_numbers = list(range(10, 21))
third_ten_numbers = list(range(20, 31))

odd_first = []
for number in first_ten_numbers:
    if number % 2 == 1:
        odd_first.append(number)

odd_second = []
for number in second_ten_numbers:
    if number % 2 == 1:
        odd_second.append(number)

odd_third = []
for number in third_ten_numbers:
    if number % 2 == 0:
        odd_third.append(number)

Refactored version:

def get_odd(numbers):
    """Return only the odd numbers."""
    odd_numbers = []
    for number in numbers:
        if number % 2 == 1:
            odd_numbers.append(number)
    return odd_numbers

first_ten_numbers = list(range(1, 11))
second_ten_numbers = list(range(10, 21))
third_ten_numbers = list(range(20, 31))

odd_first = get_odd(first_ten_numbers)
odd_second = get_odd(second_ten_numbers)
odd_third = get_odd(third_ten_numbers)

This refactored code is cleaner, shorter, and easier to maintain.

You can learn more about self-contained components of code in the tutorial on modular code.

Presenter slides

References

Analysis Function