Secure Analysis Environment (SANE)

Working with sensitive data on SURF ResearchCloud.

Before we start…

Make sure that…

  • you’ve accepted the SRAM invite
  • you have a Remote Desktop (RDP) client installed

Program

  • What is ResearchCloud?
  • What is SANE?
  • Practice: creating SANE analysis workspaces
  • Practice: data management and analysis
  • Bonus: working with data on Yoda using iBridges
  • Bonus: creating SANE setup

What is ResearchCloud?

On-demand cloud resources (compute & storage) in a user-friendly platform.

  • Easy collaboration in and across institutions.
  • Special applications, customizable environments.
  • Deploy on different cloud infrastructures.

ResearchCloud

Run VMs (workspaces) preconfigured for many different research needs:

  • Programming and analysis (e.g. R-Studio, Jupyter, Matlab).
  • Analysis tools (e.g. ASReview, 4CAT).
  • Field-specific software (e.g. GIS).
  • Webservices (e.g. Galaxy).
  • …or simply your own desktop or commandline machine in the cloud.

SANE

SANE is a ResearchCloud environment tailored for working with sensitive data.

  • Data Provider vs Researcher
    • Data sharing agreement should be in place.
  • Customizability of workspaces has been curtailed. Researchers’ workspaces do not have internet access once they have been created.
  • SANE environments have been especially tested for vulnerabilities (pen tested).

SANE

SANE is a severely restricted ResearchCloud environment tailored for working with sensitive data.

  • Tinker SANE
    • Data cannot leave researcher’s workspace
  • Blind SANE
    • Data does not even enter researchers’ workspace

Tinker SANE

Workflow:

  1. Dataprovider uploads data.
  2. Researcher starts analysis workspace (data is accessible!)
  3. Researcher performs analyses.
  4. Researcher saves results.
  5. Dataprovider verifies results, ensures they contain no sensitive data.
  6. Dataprovider exports the results, shares with researcher.

Tinker SANE

ResearchCloud Terminology

  • Workspace
  • Catalog items
  • Storage unit
  • Collaboration
    • Groups

Collaborations

SRAM:

  • You control who you invite
  • Admins can assign users to different groups:
    • src_co_wallet
    • src_co_admin
    • src_co_developer

For this tutorial, some of you are assigned to the src_co_admin group. Log in to SRAM and note your assigned groups.

ResearchCloud Portal

Everyone in the same Collaboration can see each other’s workspaces. Note the workspaces you see:

  • SANE Data Provider Portal
  • SANE Data Server
  • SANE Tinker UU Windows

In order to login, setup your ResearchCloud time-based password.

ResearchCloud Portal

Try logging in to each of the workspaces you see:

Creating SANE analysis workspace

  • Click ‘Create New’ in the portal
  • Select the SANE CO
  • Select the SANE Tinker UU Windows Catalog Item
  • Select the private network
  • Continue

NOTE: creating this workspaces will take > 40 mins

Data Provider: upload data

Access the SANE Data Provider Portal and upload a file to the directory This PC\sane-data\source.

You can use the example input files from this repository.

Upload the file data/input_data/osm_roads.shp from that repository.

Researcher: analysis

Files made available by the dataprovider will be accessible in This PC\sane-data\source.

NOTE: all researchers can see all files in This PC\sane-data\!

  • Check that you can open the files.
  • If you want, you can run a simple ‘analysis’ on them using Jupyter Notebooks
    • see next slide

Researcher: analysis

  • Open the Jupyter notebook in the sane-data/scripts folder, have a look at the code.
  • Run the code
  • Copy the results file from your home directory to the sane-data/results directory

Data provider: download results

Once the researcher is finished, you should be able to find and download the results.

Researcher: analysis

What if we try a more ‘real’ example? Add and run the following code to your notebook:

import geopandas as gpd

# Read file using gpd.read_file()
data = gpd.read_file(fp)

…what happens?

Researcher: analysis

Oops! The geopandas package is not available. You can see the available packages here.

What if we try to install the needed package?

Run the following command in your notebook and observe:

pip install geopandas

Bonus: data transfer from Yoda

What if we don’t want to upload/download data to/from the Data Provider Portal to/from the dataprovider’s laptop?

We can fetch the data directly from Yoda!

  1. Use ssh to login to the SANE Data Provider Portal (howto).
  2. sudo apt install pipx (press Y and enter)
  3. pipx ensurepath
  4. pipx install ibridges
  5. Verify: ibridges --help
  6. iBridges manual

Bonus: tinker SANE setup

Let’s delete all existing SANE workspaces and re-create the setup!