Overview
| Questions | Objectives | Key Concepts / Tools |
|---|---|---|
| What is clean code? | Write code that is easy to read and maintain. | Readability, clean code |
| How do good names improve code readability? | Use clear, descriptive names for variables and functions. | Naming conventions |
| How do I write idiomatic code? | Apply language-specific best practices. | Pythonic style, idioms |
| How do I keep code style consistent? | Automate style checks and formatting. | Linters, formatters |
| Why avoid repeating code? | Reduce redundancy and improve maintainability. | DRY principle, refactoring |
Readability
When we write code, it’s important to remember that someone else — or even our future selves — will need to understand it later.
Code that is easy to read and understand is often called clean code. Clean code helps programmers easily understand what a program is doing by applying principles that make the code more organized, intuitive, and easier to work with over time. This not only helps to inspire trust when publishing clean code with your manuscript, it will also save you (a lot of) time when revisiting code e.g. when you need to run additional analysis or make changes.
Think about your own experience reusing code:
- What do you expect from well-written code?
- What helps you understand it quickly?
- What makes it frustrating or confusing?
In this section, we’ll introduce key principles and tools that will help you write code that is both clean and maintainable.
The main aspects of clean code
Readable
The code clearly communicates its purpose by using meaningful and descriptive variable and function names;calculateTotal()is better thandoIt(). You should be able to understand what the code does without relying heavily on comments.Simple and focused
Each function does one thing and does it well.
Avoid unnecessary complexity or “clever” shortcuts — clarity always wins over cleverness.DRY (Don’t Repeat Yourself)
Avoid code duplication.
Big functions or modules are hard to reuse. Do you have scripts/modules or functions that are doing (partly) similar things? Try to create reusable functions for the shared tasks!Well-structured
Maintain consistent formatting and indentation.
Organize files, classes, and functions logically.
Use clear control flow — avoid “spaghetti code.”Tested
Use automated tests (unit, integration, etc.) to verify correctness.
Testing makes it easier to refactor or extend code safely.Self-documenting
Structure and naming should make your code understandable without excessive comments.
When comments are used, they should explain why, not what.
In this tutorial, we’ll explore several of these aspects in detail.
Meaningful names
A key part of writing readable code is using meaningful names for variables, constants, functions, classes, and other objects.
Names should be informative yet concise.
You may have encountered code where variable names are meaningless or misleading:
import pandas as pd
x = "Alex"
y = 42
z = pd.DataFrame()
my_favourite_number = "ssh, I'm a string"It’s nearly impossible to infer what these variables represent if you use them later on. To make code easier to understand, avoid cryptic or single-letter identifiers.
Single-letter names can be acceptable in specific contexts — for example, when implementing mathematical formulas where x, y, or n have clear, conventional meanings. Even then, ensure the mathematical reference is clearly explained or cited, and maintain consistency throughout your code.
In most other cases, using descriptive names composed of a few informative words greatly improves readability. Your programming language’s conventions (e.g. snake_case in Python or CamelCase in Java) will guide how you format them.
import pandas as pd
# Defining variables
first_name = "Alex"
number_of_attendees = 42
empty_dataframe = pd.DataFrame()# Using variables
print(f"Hi {first_name}")
number_of_attendees += 1
Ideally, variable names should make their purpose clear at a glance and make logical sense in the context of their use. This approach leads to self-documenting code — code that communicates intent without requiring additional comments.
Function names
Functions should be named after the task they perform. The reader should be able to infer what a function does simply by reading its name.
Good examples:
process_text <- function(data) {
...
}
processed_text <- process_text("The following document was handled using...")If a function returns a Boolean value, phrasing it as a question can make its purpose even clearer:
are_missing_values_present <- function(data) {
if (NA %in% data) {
TRUE
} else {
FALSE
}
}Which version is more readable?
Short version:
data_path <- "path/to/data"
report_data <- generate_report(model(clean(load(data_path))))More explicit version:
data <- load(data_path)
clean_data <- clean(data)
model_results <- model(clean_data)
report_data <- generate_report(model_results)The explicit version is longer, but it’s much easier to follow and debug.
Conventions
Idiomatic code is code that is following language-specific best practices and conventions. By using conventions, you make it easier for others to navigate your code. Conventions are often described in a language-specific style guide (e.g. Python: PEP8, R: Tidyverse Style guide). It is good to be aware of style guides, and how to adhere to them in a practical way is described in the next section.
Idiomatic code also means, making good use of the strengths of a programming language. Consider this example:
# Example 1 – very unpythonic
i = 0
my_data = []
while i < 100:
my_data += [i * i / 356]
i += 1
# Example 2 – better use of Python features
my_data = []
for i in range(100):
my_data.append(i**2 / 356)
And the most Pythonic version:
# Example 3 – idiomatic Python using list comprehension
my_data = [i**2 / 356 for i in range(100)]The final version is concise, expressive, and easy to read, at least for Python programmers that have experience with list comprehensions. In the end the readability of the code depends on the person(s) reading it, so which conventions work best for you is up to you.
Automate style checks
Following a style guide from the start of a project is good practice. However, ensuring ongoing compliance manually can be tedious. Automated tools can help by detecting style issues as you code or by reformatting your code automatically.
See Linters and formatters for details on automating these checks.
| Language | Linters | Formatters |
|---|---|---|
| R | lintr | formatR, styler |
| Python | ruff, pylint | ruff, black |
Exercise
Linters
Run a linter on your code to identify style issues:
R: lintr
Python: ruff check
Edit your code to improve its style based on the linter’s feedback.
Formatters
Run an autoformatter to automatically fix formatting issues:
R: styler
Python: ruff format
If you encounter code that’s hard to read or variables that need better names, mark them for later improvement using # TODO or another consistent label.
Don’t repeat yourself
During early development, it’s common to create “quick and dirty” solutions with repeated code. Over time, duplication wastes effort and makes maintenance harder.
For example, imagine a script that repeats nearly identical logic three times. If you need to fix or modify that logic, you must update each copy — and missing one could introduce a bug.
Repeated code:
first_ten_numbers = list(range(1, 11))
second_ten_numbers = list(range(10, 21))
third_ten_numbers = list(range(20, 31))
odd_first = []
for number in first_ten_numbers:
if number % 2 == 1:
odd_first.append(number)
odd_second = []
for number in second_ten_numbers:
if number % 2 == 1:
odd_second.append(number)
odd_third = []
for number in third_ten_numbers:
if number % 2 == 0:
odd_third.append(number)Refactored version:
def get_odd(numbers):
"""Return only the odd numbers."""
odd_numbers = []
for number in numbers:
if number % 2 == 1:
odd_numbers.append(number)
return odd_numbers
first_ten_numbers = list(range(1, 11))
second_ten_numbers = list(range(10, 21))
third_ten_numbers = list(range(20, 31))
odd_first = get_odd(first_ten_numbers)
odd_second = get_odd(second_ten_numbers)
odd_third = get_odd(third_ten_numbers)This refactored code is cleaner, shorter, and easier to maintain.
You can learn more about self-contained components of code in the tutorial on modular code.