Research Engineering team

Connecting Research and IT at Utrecht University

| About | Projects | Publications | Manuals | Contact |

Projects

Here you find an overview of our current and completed projects. According to Open Science principles, most of our software is publicly accessible on GitHub, and can be found via the title link.

Current projects

Kickstarter
Time frame: 06/2023 - present
Research domain: Faculty of Geosciences
Technologies: Generative AI
Research Engineers: Modhurita Mitra, Shiva Nadi, Parisa Zahedi
We use generative AI (OpenAI’s ChatGPT API) to assign industry codes (NAICS) to Kickstarter projects. Kickstarter is a crowdfunding website for raising money for creative projects. We are working with dr. Nicola Cortinovis who is using this information to investigate whether Kickstarter projects contribute to local economic growth at the county level in the United States, for various industry sectors.
HTA
Time frame: 11/2023 - present
Research domain: Faculty of Science
Technologies: Python, NLP
Research Engineers: Maarten Schermer, Shiva Nadi, Modhurita Mitra
In this project, we aim to create a centralized database by aggregating data points extracted from Health Technology Assessment documents. Our goal is to streamline and enhance the document labeling process through automation.
Seedlists
Time frame: 10/2023 - present
Research domain: Botanical Gardens
Technologies: Python, OCR, Plant taxonomy
Research Engineers: Maarten Schermer, Christine Staiger
The Botanical Gardens has a collection of historical seed lists dating back to 1837, in the form of PDF’s and scanned documents. The project aims to help unlock the information in these seedlists and make it available to researchers, enabling them to study collection policies over the centuries and detect possible effects of climate change on the collection. Ideally, the project results in a pipeline that can be used by other botanical institutes.
PSE
Time frame: 07/2023 - present
Research domain: Faculty of Science
Technologies: Python, Jax, packaging
Research Engineers: Raoul Schram, Modhurita Mitra
We are collaborating with William Torre to revive a molecular dynamics simulation plugin. We will put the functionality of the plugin into its own standalone package that will be much easier to install than the current state of the project.
Excalibur (Example sentences Calibrated for Use in Research)
Time frame: 03/2023 - present
Research domain: Institute for Language Sciences
Technologies: Python, machine translation, automated POS-tagging
Research Engineers: Maarten Schermer
We are collaborating with Digital Humanities IT on the creation of a database with example sentences, their translation (Dutch-English), and interlinear glosses. The project includes a pipeline for extracting, correcting and annotating glosses from publications. It also aims to automatically generate translations and glosses for new, user-supplied example sentences.
Babble
Time frame: 01/2023 - present
Research domain: Faculty of Humanities
Technologies: Python, Speech classification, Machine learning
Research Engineers: Parisa Zahedi, Jelle Treep
Collaboration with dr. Sita ter Haar and dr. Heysem Kaya on automated processing and classification of audio recordings from the YOUth project.
Cognitive Mapping
Time frame: 03/2023 - present
Research domain: Faculty of Law, Economics and Governance
Technologies: R, CRAN, Packaging, CI, testing
Research Engineers: Jelle Treep
In this project in collaboration with the University Library we give small scale support for publishing an R package on CRAN. The package contains tools for Cognitive mapping analysis developed by Prof. Femke van Esch.
Soothreat
Time frame: 12/2022 - present
Research domain: Clinical Psychology
Technologies: Python, Machine Learning, Natural Language Processing, Topic Modeling, LDA
Research Engineers: Shiva Nadi, Raoul Schram, Ken Krige
In this project we are developing methods to help researchers to improve the understanding of threats and soothers for people with a central sensitivity syndrome (i.e., irritable bowel syndrome, fibromyalgia, or chronic fatigue syndrome). Threats create experiences of danger, harm, damage, or unsafety and may worsen patients’ physical symptoms. Soothers on the other hand create feelings of calmness, well-being, safety, or social connectedness and that may alleviate patients’ physical symptoms. We are dealing with multiple topics and short texts in this project.
Better life index
Time frame: 08/2022 - present
Research domain: Faculty of Humanities
Technologies: Python, Django, Flask
Research Engineers: Roel Brouwer, Jelle Treep
Collaboration with Dr. R. Philips and the Research Software Lab on a web application for the yearly publication of the Better life index (“Brede Welvaart Index”). In this project we create a backend for the web application.
Semantics of Sustainability
Time frame: 07/2022 - present
Research domain: Faculty of Humanities, History and Art History
Technologies: Python, NLP, BERT, Deep learning, Huggingface Transformers
Research Engineers: Parisa Zahedi
The aim of this project is to investigatate the conceptual history of a certain topic in a collection of texts. To this end we work on Dutch language models for historical research. The current available language models for Dutch (at time of writing) fall short for historical research, because they are trained only on recent data. The performance of models declines steadily when applied to data that lies outside of the distribution of the training on corpora.
metasyn
Time frame: 04/2022 - present
Research domain: Department of Methodology & Statistics
Technologies: Python, Machine Learning, Synthetic data, Privacy
Research Engineers: Raoul Schram
Privacy and proper disclosure control is a hot topic at the moment. This project aims to create a standard to share statistical information and generate synthetic data. Metasyn fits a distribution to each of the variables, while it also generates a synthetic dataset from this information. In between, there is a generative metadata file that contains the condensed information as a human-readable JSON file.
Breakthrough patents
Time frame: 06/2022 - present
Research domain: Faculty of Law, Economics and Governance
Technologies: Python, Machine Learning, Natural Language Processing, BERT, TfIdf
Research Engineers: Jelle Treep, Raoul Schram, Shiva Nadi, Maarten Schermer
We use multiple Natural Language Processing ML methods to predict whether a particular patent is a breakthrough innovation or not. This project includes work to simplify the whole pipeline from reading the patents, preprocessing, prediction and analysis.
tweet_collector
Time frame: 2021 - present
Research domain: Faculty of Humanities, Media and Culture Studies
Technologies: Python, searchtweets, elasticsearch/kibana
Research Engineers: Parisa Zahedi, Roos Voorvaart
Twitter forms a rich source of information for researchers interested in studying ‘the public conversation’. The Academic Research product track is designed to serve the needs of the academic research community. It provides researchers with special levels of access to public Twitter data without any cost. This project is aimed to help researchers to use the Academic Research product track to collect tweets of their interest and analyze them.

Completed projects

Precision Nudging
Time frame: 04/2021 - 11/2023
Research domain: Faculty of Law, Economics and Governance, Public Governance and Management
Technologies: Python, machine learning, regression analysis, synthetic data
Research Engineers: Haili Hu, Raoul Schram
Changing behavior is necessary to tackle societal problems, such as obesity and financial problems. One way to change behavior is by nudging people. A nudge is a way to change behavior without prohibiting options or changing its costs. However, nudges are often one-size-fits-all techniques: everyone is offered the same nudge. The scientific aim of this project is to use open data to develop predictive models with Machine Learning, in order to determine the most effective nudge for persons, given the nudging goal and the individual personal circumstances. To test our models, we created realistic synthetic data.
Hyde
Time frame: 11/2022 - 9/2023
Research domain: Faculty of Geosciences
Technologies: Python, packaging, code quality
Research Engineers: Jelle Treep, Raoul Schram, Maarten Schermer
The HYDE project concerns the world wide historical land use. The HYDE project was already a well established software project, with more than 16K lines of python code. We are helping the project by introducing modern software techniques and workflows.
Streetview segmentation
Time frame: 06/2022 - 02/2023
Research domain: Faculty of Veterinary Medicine
Technologies: Python, Machine Learning, Computer Vision, Visual Transformers, Docker
Research Engineers: Raoul Schram, Maarten Schermer
We are using recent computer vision models based on Visual Transformers for semantic segmentation, assigning semantic classes such as “street”, “vegetation” and “water” to the pixels of photos taken in the urban environment. This will be used for further research into the effect of the presence of water bodies and vegetation on human well-being. Follow-up to the Streetview greenery project.
SummerFAIR
Time frame: 02/2021 - 02/2023
Research domain: Faculty of Veterinary Medicine, Veterinary Epidemiology
Technologies: Semantic Web, Ontologies, RDF, Docker, Python, R
Research Engineers: Martine de Vos,Christine Staiger
The summerFAIR project aims to integrate existing data sets on transmission experiments to enable reanalysis and meta-analysis. We have developed a pipeline based to map data to a shared vocabulary, convert them to linked data triples and perform integrated analyses.
Crunchbase
Time frame: 05/2020 - 01/2023
Research domain: Faculty of Geosciences, Dynamics of Innovation Systems
Technologies: Webscraping, Internet Archive, Pipeline, Internet, Python, Kinesis Firehose, AWS, Terraform
Research Engneers: Maarten Schermer, Casper Kaandorp, Martine de Vos
The Crunchbase project assesses the sustainability of European startup-companies by analyzing their websites. As the researcher is interested in the current, as well as the pre-Corona situation, we scrape webpages from the Internet Archive. Together with SURF we have developed a pipeline for collecting and analyzing these webpages, using AWS as infrastructure.
Data donation - WhatsApp
Time frame: 02/2022 - 11/2022
Research domain: ODISSEI Social Data Science Team (SoDa)
Technologies: Python, Privacy, Pyodide
Research Engineers: Parisa Zahedi, Shiva Nadi
As an extension to Data-Donation project, we developed scripts to extract information from WhatsApp data download packages. In this study respondents can voluntary donate their group chats and/or their account information files through an online platform (PORT).
TORS
Time frame: 10/2021 - 06/2022
Research domain: Faculty of Science, Information and Computing Sciences
Technologies: Python, Django, Docker, C++
Research Engineers: Roel Brouwer, Haili Hu
In this project, we define the requirements to set up a scientific challenge in the domain of Train Unit Shunting and Servicing, using the TORS simulator developed by Utrecht University and TUDelft. In between transportation services, trains are parked and serviced at shunting yards. The objective of the challenge is to test the robustness and flexibility of scheduling algorithms, that position all the trains at the right position and get the maintenance tasks done in limited run time.
Coast Snap
Time frame: 10/2021 - 06/2022
Research domain: Faculty of Geosciences
Technologies: Python, Elixir, Databases, Web development
Research Engineers: Casper Kaandorp
For this citizen science project, we created a web application for collecting and processing pictures of the shoreline taken by citizens at different locations in the Netherlands
Dynamiek in beeld
Time frame: 2021 - May 2022
Research domain: ODISSEI Social Data Science Team (SoDa)
Technologies: Shiny, R
Research Engineers: Parisa Zahedi, Shiva Nadi
An application that can be used in a clinical setting to score dynamics in empathy. There are some questions to be asked whereafter the results are visualized. The visualization should help the clinician to ask the right questions immediately.
Deviance in Art
Time frame: 02/2022 - 04/2022
Research domain: Faculty of Social and Behavioural Sciences
Technologies: Python, GoogleArts, WikiArts, APIs
Research Engineers: Raoul Schram
Create a webscraper that can retrieve metadata and artworks from the GoogleArts and WikiArts websites. The deliverable is a generic Python package. This package will be used for multiple research questions involving machine learning.
Ocean Parcels Numba
Time frame: 07/2021 - 02/2022
Research domain: Faculty of Science, Physical Oceanography
Technologies: Python, Numba
Research Engineers: Roel Brouwer, Raoul Schram
Investigating the feasibility of speeding up existing Python code for Parcels using Numba. The aim is to speed up the simulation enough to eliminate the need for a separate (partial) JIT/C path in the code. This should lead to a more flexible and maintainable code base.
Data Donation - proof of concept
Time frame: 04/2021 - 10/2021
Research domain: Faculty of Social Sciences, Human Data Science group
Technologies: Python, WebAssembly, Pyodide, synthetic data
Research Engineers: Haili Hu, Roos Voorvaart
In this project, we collaborated with the Human Data Science group and Eyra to make data from social media platforms easily accessible to researchers, while preserving privacy. Respondents can voluntary donate their data download packages through an online platform (PORT), and researchers can provide custom data extraction scripts, which will be run locally on the respondent’s devices. A proof-of-concept PORT has been developed by Eyra, while data extraction scripts and synthetic data packages were provided by the Research Engineering team.
Ocean Parcels particle-particle interaction
Time frame: 07/2020 - 07/2021
Research domain: Faculty of Science, Physical Oceanography
Technologies: Python, data structures, simulation
Research Engineers: Roel Brouwer, Raoul Schram
Providing a working implementation of particle-particle interaction for Parcels. The aim was to allow simulated particles to interact and influence each others states. This project involved reviewing and partially restructuring the data structures that Parcels uses for storing particle data, and implementing particle-particle interaction under certain conditions.
Large scale network experiments
Time frame: 2020 - 2021
Research domain: Faculty of Social Sciences
Technologies: Python, Elixir, Databases, MTurk
Research Engineers: Casper Kaandorp
For this sociology project, we recruited people via Amazon Mechanical Turk and had them play a networking game
Network Entropy
Time frame: 2020 - 2021
Research domain: Faculty of Science, Information and Computing Sciences
Technologies: Temporal networks, Python, Numba, simulation
Research Engineers: Raoul Schram
To improve the theoretical analysis and comparison of different temporal networks, we have invented a new metric to study them. The measure is called network entropy, and is applicable to any temporal network. We showed with simulations that processes on a network behave very differently, depending on the network entropy.
hist-aware
Time frame: 2020 - 2021
Research domain: Faculty of Humanities, History and Art History
Technologies: Python, NLP, Deep learning, Huggingface Transformers
Research Engineers: Leonardo Vida, Parisa Zahedi
This project makes use of the Delpher archive (delpher.nl/kranten), which is the largest public collection of digitized pages from Dutch historical newspapers. The research team is mining articles’ sentiments, as expressed by the author of the articles, extracting all the relevant Delpher articles around specific topics (i.e. energy) and is currently training natural language processing (NLP) models called Transformers to extract a sufficiently accurate representation of the sentiment of each article. Currently, the team is making use of the period 1960-1995 consisting of around 250.000 articles around the topics chosen.
Protosc
Time frame: 2020 - 2021
Research domain: Faculty of Social Sciences
Technologies: Feature selection, Python, image classification, wrapper, filter, genetic algorithm
Research Engineers: Raoul Schram, Roos Voorvaart
Protosc is a Python library that aims to determine which features are relevant to a given classification problem. It does so by using wrapper/filter/genetic algorithms, after which automatic statistical analysis is used to determine which features are significant. The package also includes a few different options for an image classification pipeline.
Animal Sounds
Time frame: 2019 - 2021
Research domain: Faculty of Science, Ecology and Biodiversity Group
Technologies: Python, bioacoustics, audio, machine learning, deep learning, feature engineering
Research Engineers: Jelle Treep, Parisa Zahedi, Casper Kaandorp
We developed algorithms and a data processing workflow to detect vocalizations of Chimpanzees in a large body of audio data from the African tropical rainforest. The workflow consists of: 1) a filtering step where irrelevant audio data is removed to speed up manual annotation, 2) a feature engineering and feature selection step, and 3) classification using support vector machines and convolutional neural networks.
Porpoise Reproduction
Time frame: 2020
Research domain: Faculty of Veterinary Medicine, Pathology
Technologies: R, sf, raster, rgdal
Research Engineers: Jelle Treep, Roos Voorvaart
Porpoise Reproduction studies how reproduction rates of harbour porpoises are affected by various factors. In this RSE project marine regions were enriched with Cumulative Human Impact model data
Streetview greenery
Time frame: 2019 - 2021
Research domain: Faculty of Geosciences
Technologies: Machine learning, Python, image segmentation, deeplab, kriging, geolocation, CityScapes
Research Engineers: Raoul Schram
For the streetview project we have used the (formerly) open street view data from the municipality of Amsterdam to create a map of the greenness. This is done by taking the images and segmenting each image into different classes. The number of pixels in each image belonging to the “greenery” class is used to create the Amsterdam greenery map.
ASReview
Time frame: 2018 - 2021
Research domain: Faculty of Social Sciences
Technologies: Machine learning, active learning, Python, Flask, hyperparameter optimization, simulation
Research Engineers: Raoul Schram, Parisa Zahedi, Jonathan de Bruin
ASReview is a machine learning tool to aid researchers in performing systematic reviews. It uses active learning to present users with more likely relevant papers. It has been written in Python 3.7+, and hyper parameters have been optimized using the hyperopt package. We have also contributed to the initial back-end for the user interface using Flask.
Agri-activism
Time frame: 2019 - 2020
Research domain: Faculty of Social Sciences
Technologies: Python, NLP, topic-modeling, network-analysis
Research Engineers: Parisa Zahedi, Martine de Vos
This project aims to explore the potential of Twitter data in making sense of online debates. More specifically, the focus is on the online manifestation of the anti-Monsanto movement and assess tweets to investigate (1) activists’ behaviors and opinions, (2) the shape of their networks, (3) their organization and leadership, and (4) information diffusion patterns. Monsanto is one of the world’s largest producers of both agrochemicals and genetically modified crop seeds.
Global Goals
Time frame: 10/2019 - 08/2020
Research domain: Faculty of Geosciences, Global Sustainability Governance
Technologies: Webscraping, Python, AWS, Terraform
Research Engineers: Jelle Treep, Martine de Vos
The Global Goals project investigates the effect of the United Nations’ Sustainable Development Goals (SDGs) on the global network of intergovernmental organizations. This network is represented by the hyperlinks on the organizations’ websites. We have retrieved the historical - from 2012 up to 2019- hyperlinks for a given set of international organizations via the Internet Archive.
iBridges
Time frame: Paused
Research domain: Research Data management
Technologies: iRODS, Yoda
Research Engineers: Christine Staiger
Goals: 1) Provide a graphical user interface to non-tech savvy researchers to coordinate their data on iRODS and Yoda. 2) Provide an easy-to-use python library for scientific programmers to program against iRODS and Yoda enabling them to integrate data management into computation workflows.
webmice
Time frame: July 2023 - November 2023
Research domain: Statistics
Technologies: R, Docker, HTTP API framework for R RestRserve
Research Engineers: Christine Staiger, Roel Brouwer
Building an HTTP API around the statistical R package mice to increase the interoperability of the R package with other programming languages.
AnonymoUUs / AnonymizeYouth
Time frame: Paused
Research domain: Faculty of Social Science
Technologies: Python, de-identification
Research Engineers: Maarten Schermer, Casper Kaandorp, Martine de Vos
Researchers often use personal data in their research. According to the GDPR these data need to be de-identified. We have developed tools to de-identify textual data. For the specific use case of the YOUth cohort studies, additional tools have been developed to prepare the YOUth data for correct de-identification.