researchcloud-items

Installation scripts for SURF ResearchCloud catalog components

View the Project on GitHub UtrechtUniversity/researchcloud-items

Playbook openrefine

back to index

Summary

This role installs [OpenRefine]https://openrefine.org/), an open-source web application for data cleanup and transformation to other formats, an activity commonly known as data wrangling.

This component uses JupyterHub to spawn a separate instance of OpenRefine for each user that logs in via the browser. This way, users do not have access to each other’s data. The playbook can be added to a workspace that already contains a working JupterHub installation, or it can install JupyterHub itself (see the openrefine_jupyter_force_install parameter).

Security note

JupyterHub spawns servers (JupyterLab) running under the uid of the user logged in the browser. Authentication in the browser is handled by SRAM: after logging in, the name of the user is set by SRAM in the REMOTE_USER header, and this is passed along to JupyterHub.

At the moment OpenRefine listens on a TCP port. This means users with shell access (and JupyterLab provides a shell!) can easily bypass authentication by hitting the address on localhost on which another user’s OpenRefine server is listening. Do not provide shell access to untrusted users.

Variables

Note: the FilesExtension extension (which allows the user to import data from the server’s local filesystem) is already installed by default.

See also

History

2025 Written by Dawa Ometto (Utrecht University)

back to index