Dependency management

Congratulations!

You now have a project!

  • Your project structure is accessible
  • Your code is readable and invites re-use
  • The project is under version control
  • It has a landing page on Github, with information for a user

Are you done? It depends…

Dependencies

Dependencies and versions can stop your users/readers from being able to run your code. For example: this code written in Python 2.7:

print "Hello world!"

No longer works in Python 3!

print "Hello world!"
  File "/var/folders/96/r1yycxlj28958p1cdynhbyzw0000gn/T/Rtmpa0OGSM/chunk-code-b08d2b78904b.txt", line 1
    print "Hello world!"
                       ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Hello world!")?

Instead, we write:

print("Hello world!")

The reproducibility trade-off

How far do you go towards reproducibility?


  • due diligence starts at declaring dependencies.

  • You can empower your declared dependencies with a package/environment manager such as uv (or conda).

  • packaging dependencies using tools like renv (for R), or uv (for Python).

  • containers are awesome, and container tools like Docker and Singularity probably sound more daunting than they actually are.
  • online environments can be created for your work (in a relatively user friendly way):

Declaring dependencies

OK: declare (in your README) how your project works for you.

  • What language, what version?
  • What packages/libraries do you load
  • What OS do you use? (Does it work on your collaborator’s system?)

Better: prepare a file for an environment manager:

  • what? A single file describing the necessary dependencies, which can be used to install all dependencies in one step
  • where? Store the file in the repository root (main folder)
  • which? This depends on the environment/package manager you want to use:
    • requirements.txt
    • environment.yml
    • pyproject.toml

Dependency managers


uv (for Python)

  • An extremely fast Python package and project manager, written in Rust.
  • Create separate environments for your projects
  • Supports lock files and pyproject.toml
  • Relatively new, increasingly popular and our recommendation
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.11
uv init
uv add 'pandas==1.5.3'

Dependency managers


conda (for Python and R)

  • Create separate environments for your projects
  • Switch between environments
  • Store the environment description in environment.yml
  • Consider going through this quick intro to conda environments.
  • Or get the full story in the conda documentation.

Installation

conda create -n myenv python=3.11
conda activate myenv
conda install pandas=1.5.3

Dependency managers


renv (for R)

Install with:

install.packages("renv")

Activate with:

renv::init()

Update with:

renv::snapshot()

Load the contents of a lockfile with:

renv::restore()

Read more here.

Summary











  • Good: Declaring dependencies in a README
  • Better: Declaring dependencies in an environment file
  • Close to fully reproducible: lock files

Your turn!

  • Add basic dependency information to your readme file:

    • What version of your language is required?
    • Which packages does a user need to load before running your project?
    • Can you provide installation instructions?
  • Are you working with Python?

    • Generate an environment for your project, and store the environment.yml file in root.
  • Are you working with R?

    • Install renv, and initiate it to store the lock file in root.
  • Be sure again to update your git repository:

    git add [your changed files]
    git commit -m "the change you made"
    git push