Using Git for Data Transfer on SURF Research Cloud

Git is a free-open source, version control system commonly used for managing code and data repositories. This guide shows you how to transfer data from GitHub or GitLab repositories to your SURF Research Cloud workspace.

Utrecht University’s RDM support team offers workshops if you want to learn more about Git and version control.

When to Use Git

Git is ideal for:

  • Code repositories: Scripts, notebooks, and source code
  • Collaborative projects: Sharing and syncing work with team members

Git is not recommended for research data, although there is nothing wrong with tracking small, non-sensitive data files with Git.

Prerequisites

  • A GitHub or GitLab account with access to the desired repository
  • Experience with Command Line Interface (CLI) is helpful. If you’re new to using command line, check out this introductory workshop by Utrecht University’s Digital Competence Center: Introduction to Bash.

Commands with git are run with the command line. Follow the instructions below to open a terminal in your workspace.

Open a Terminal

  • Python Workbench/CLI workspaces: Open a terminal
  • Jupyter Notebook/VRE Lab: Click + in file browser → Select Terminal
  • Windows workspaces: Use PowerShell or Command Prompt
  • Desktop workspaces: Open the Terminal application

Quick Start: Clone a Repository

All workspaces come with git pre-installed, so you can start cloning repositories right away. You can check the version of git installed by running:

git --version

Step 1: Get the Repository URL

From GitHub:

  1. Go to your repository on GitHub
  2. Click the green Code button
  3. Copy the HTTPS URL (e.g., https://github.com/username/repository.git)

From GitLab:

  1. Go to your repository on GitLab
  2. Click the Clone button
  3. Copy the HTTPS URL

Step 2: Clone the Repository

In the terminal, first navigate to the folder where you want to put the repository (e.g. inside your storage volume and run:

git clone https://github.com/username/repository.git 
#Replace the URL with your actual repository URL.

This creates a folder with all your repository files.

Updating Your Data

If the repository is updated on GitHub/GitLab, you can pull your changes with the git pull command. First make sure you are located in your repository folder:

git pull

Uploading Changes Back (Requires Personal Access Token)

If you’ve made changes and want to push them back:

git add .
git commit -m "Description of changes"
git push
ImportantAuthentication for Pushing

If you have two-factor authentication (2FA) enabled on GitHub/GitLab, you’ll need to use a Personal Access Token instead of your password when pushing.

When prompted for a password, enter your token instead.

ImportantGitHub repositories within UU GitHub organization

If your repository is part of the UU GitHub organization, you have to authorize your personal access token.

Important

Don’t share your Personal Access Token. Treat it like your password

Common Git Commands

Command Purpose
git clone <url> Download a repository
git pull Get latest changes from remote
git status Check what files have changed
git add <file> Stage specific file for commit
git add . Stage all changes for commit
git commit -m "message" Save staged changes with a message
git push Upload commits to remote repository
git log View commit history
git branch List branches

Using Git in JupyterLab Interface (GUI Alternative)

Workspaces that support JupyterLab (Jupyter Notebook, VRE Lab) include a built-in Git extension that provides a graphical interface for Git operations, making it easier for users who prefer not to use the command line.

Accessing the Git UI

  1. In JupyterLab, look for the Git icon in the left sidebar
  2. Click it to open the Git panel.

If you don’t see it, you can install the extension with pip install jupyterlab-git.

Available Features

The Git UI allows you to:

  • View changes: See modified, added, and deleted files
  • Stage/unstage files: Click the + or - icons next to files
  • Commit changes: Enter a commit message and click “Commit”
  • Push/pull: Use the cloud icons to sync with remote repository
  • View history: See commit history and branches
  • Switch branches: Create and switch between branches

This provides a user-friendly alternative to command-line Git operations, especially useful for beginners.

Note

Authentication: The same Personal Access Token requirements apply when pushing through the JupyterLab Git UI.

Troubleshooting

git: command not found

  • Git is not installed.
  • Try installing it with: sudo apt install git

Authentication failed

  • Check if your Personal Access Token
  • Ensure you have access to the repository
  • If you have 2FA enabled, use a Personal Access Token instead of your password
  • Ensure your Personal access token is authorized for Single-Sign on, if your repository is under the UU GitHub organization.

Repository not found

  • Verify the URL is correct (e.g. https and not ssh)
  • If the repository is private, ensure you have permission to access it

Large files cause errors

Tips

GitHub user documentation can be found here: GitHub Docs

Git documentation can be found here: Git Documentation