Dependency management

Congratulations!

You now have a project!

  • Your project structure is accessible
  • Your code is readable and invites re-use
  • The project is under version control
  • It has a landing page on Github, with information for a user

Are you done? It depends…

Dependencies

Dependencies and versions can stop your users/readers from being able to run your code. For example: this code written in Python 2.7:

print "Hello world!"

No longer works in Python 3!

print "Hello world!"
  File "/var/folders/96/r1yycxlj28958p1cdynhbyzw0000gn/T/Rtmpa0OGSM/chunk-code-b08d2b78904b.txt", line 1
    print "Hello world!"
                       ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Hello world!")?

Instead, we write:

print("Hello world!")
Hello world!

The reproducibility trade-off

How far do you go towards reproducibility?


image/svg+xml No information Integrated set-up Difficult Easy Reproducibility Environment declaringdependencies ad hocexploration packagingdependencies container

  • due diligence starts at declaring dependencies.

  • You can empower your declared dependencies with a package/environment manager such as conda.

  • packaging dependencies uses tools like renv (for R), or pipenv (for Python).

  • containers are awesome, and container tools like Docker and Singularity probably sound more daunting than they actually are.
  • online environments can be created for your work (in a relatively user friendly way):

Our advice

Are you working with Python?

  • Use conda and declare your dependencies in an environment.yml file.

Are you working with R?

  • Use renv and package your dependencies in a lockfile.

In the next slides we will elaborate on some other options as well, but honestly, just do this.

Declaring dependencies

OK: declare (in your README) how your project works for you.

  • What language, what version?
  • What packages/libraries do you load
  • What OS do you use? (Does it work on your collaborator’s system?)

Better: prepare a file for a package manager:

  • what? A single file describing the necessary dependencies, which can be used to install all dependencies in one step
  • where? Store the file in the repository root (main folder)
  • which? This depends on the environment/package manager you want to use:
    • For conda (python and R): generate an environment.yml file
    • For pip (python only): generate a requirements.txt file

Declaring dependencies

environment.yml (for conda)

requirements.txt (for pip)

  • Generate it with pip freeze > requirements.txt
  • Here is an example. Search for more on GitHub.
  • Install dependencies declared with pip install -r requirements.txt

Packaging dependencies

In R: renv

Install with:

install.packages("renv")

Activate with:

renv::init()

Update with:

renv::snapshot()

Load the contents of a lockfile with:

renv::restore()

Read more here.


In Python: pipenv

Follow this brief tutorial to set up pipenv for your project.

Your turn!

  • Add basic dependency information to your readme file:

    • What version of your language is required?
    • Which packages does a user need to load before running your project?
    • Can you provide installation instructions?
  • Are you working with Python?

    • Generate an environment for your project, and store the environment.yml file in root.
  • Are you working with R?

    • Install renv, and initiate it to store the lock file in root.
  • Be sure again to update your git repository:

    git add [your changed files]
    git commit -m "the change you made"
    git push