Secure Analysis Environment (SANE)
Working with sensitive data on SURF ResearchCloud.
Before we start…
Make sure that…
- you’ve accepted the SRAM invite
- you have a Remote Desktop (RDP) client installed
Program
- What is ResearchCloud?
- What is SANE?
- Practice: creating SANE analysis workspaces
- Practice: data management and analysis
- Bonus: working with data on Yoda using iBridges
- Bonus: creating SANE setup
What is ResearchCloud?
On-demand cloud resources (compute & storage) in a user-friendly platform.
- Easy collaboration in and across institutions.
- Special applications, customizable environments.
- Deploy on different cloud infrastructures.
ResearchCloud
Run VMs (workspaces) preconfigured for many different research needs:
- Programming and analysis (e.g. R-Studio, Jupyter, Matlab).
- Analysis tools (e.g. ASReview, 4CAT).
- Field-specific software (e.g. GIS).
- Webservices (e.g. Galaxy).
- …or simply your own desktop or commandline machine in the cloud.
SANE
SANE is a ResearchCloud environment tailored for working with sensitive data.
- Data Provider vs Researcher
- Data sharing agreement should be in place.
- Customizability of workspaces has been curtailed. Researchers’ workspaces do not have internet access once they have been created.
- SANE environments have been especially tested for vulnerabilities (pen tested).
SANE
SANE is a severely restricted ResearchCloud environment tailored for working with sensitive data.
- Tinker SANE
- Data cannot leave researcher’s workspace
- Blind SANE
- Data does not even enter researchers’ workspace
Tinker SANE
Workflow:
- Dataprovider uploads data.
- Researcher starts analysis workspace (data is accessible!)
- Researcher performs analyses.
- Researcher saves results.
- Dataprovider verifies results, ensures they contain no sensitive data.
- Dataprovider exports the results, shares with researcher.
Tinker SANE
ResearchCloud Terminology
- Workspace
- Catalog items
- Storage unit
- Collaboration
Collaborations
SRAM:
- You control who you invite
- Admins can assign users to different groups:
- src_co_wallet
- src_co_admin
- src_co_developer
For this tutorial, some of you are assigned to the src_co_admin
group. Log in to SRAM and note your assigned groups.
ResearchCloud Portal
Everyone in the same Collaboration can see each other’s workspaces. Note the workspaces you see:
- SANE Data Provider Portal
- SANE Data Server
- SANE Tinker UU Windows
In order to login, setup your ResearchCloud time-based password.
ResearchCloud Portal
Try logging in to each of the workspaces you see:
- SANE Tinker UU Windows
- SANE Data Provider Portal
- Login by clicking the yellow ‘Access’ button
- SANE File Server
Creating SANE analysis workspace
- Click ‘Create New’ in the portal
- Select the SANE CO
- Select the SANE Tinker UU Windows Catalog Item
- Select the private network
- Continue
NOTE: creating this workspaces will take > 40 mins
Data Provider: upload data
Access the SANE Data Provider Portal and upload a file to the directory This PC\sane-data\source
.
You can use the example input files from this repository.
Upload the file data/input_data/osm_roads.shp
from that repository.
Researcher: analysis
Files made available by the dataprovider will be accessible in This PC\sane-data\source
.
NOTE: all researchers can see all files in This PC\sane-data\
!
- Check that you can open the files.
- If you want, you can run a simple ‘analysis’ on them using Jupyter Notebooks
Researcher: analysis
- Open the Jupyter notebook in the
sane-data/scripts
folder, have a look at the code.
- Run the code
- Copy the results file from your home directory to the
sane-data/results
directory
Data provider: download results
Once the researcher is finished, you should be able to find and download the results.
Researcher: analysis
What if we try a more ‘real’ example? Add and run the following code to your notebook:
import geopandas as gpd
# Read file using gpd.read_file()
data = gpd.read_file(fp)
…what happens?
Researcher: analysis
Oops! The geopandas
package is not available. You can see the available packages here.
What if we try to install the needed package?
Run the following command in your notebook and observe:
pip install geopandas
Bonus: data transfer from Yoda
What if we don’t want to upload/download data to/from the Data Provider Portal to/from the dataprovider’s laptop?
We can fetch the data directly from Yoda!
- Use
ssh
to login to the SANE Data Provider Portal (howto).
sudo apt install pipx
(press Y
and enter)
pipx ensurepath
pipx install ibridges
- Verify:
ibridges --help
- iBridges manual
Bonus: tinker SANE setup
Let’s delete all existing SANE workspaces and re-create the setup!