Projects
Here you find an overview of our current and completed projects. According to Open Science principles, most of our software is publicly accessible on GitHub, and can be found via the title link.
Current projects
Cognitive Mapping |
Time frame: 03/2023 - present |
Research domain: Faculty of Law, Economics and Governance |
Technologies: R, CRAN, Packaging, CI, testing |
Research Engineers: Jelle Treep |
In this project in collaboration with the University Library we give small scale support for publishing an R package on CRAN. The package contains tools for Cognitive mapping analysis developed by Prof. Femke van Esch. |
Hyde |
Time frame: 11/2022 - present |
Research domain: Faculty of Geosciences |
Technologies: Python, packaging, code quality |
Research Engineers: Jelle Treep, Raoul Schram, Maarten Schermer |
The HYDE project concerns the world wide historical land use. The HYDE project was already a well established software project, with more than 16K lines of python code. We are helping the project by introducing modern software techniques and workflows. |
Soothreat |
Time frame: 12/2022 - present |
Research domain: Clinical Psychology |
Technologies: Python, Machine Learning, Natural Language Processing, Topic Modeling, LDA |
Research Engineers: Shiva Nadi, Raoul Schram, Ken Krige |
In this project we are developing methods to help researchers to improve the understanding of threats and soothers for people with a central sensitivity syndrome (i.e., irritable bowel syndrome, fibromyalgia, or chronic fatigue syndrome). Threats create experiences of danger, harm, damage, or unsafety and may worsen patients’ physical symptoms. Soothers on the other hand create feelings of calmness, well-being, safety, or social connectedness and that may alleviate patients’ physical symptoms. We are dealing with multiple topics and short texts in this project. |
Better life index |
Time frame: 08/2022 - present |
Research domain: Faculty of Humanities |
Technologies: Python, Django, Flask |
Research Engineers: Roel Brouwer, Jelle Treep |
We create a backend for a web application for yearly publication of the Better life index (“Brede welvaart index”) in the Netherlands. |
Semantics of Sustainability |
Time frame: 07/2022 - present |
Research domain: Faculty of Humanities, History and Art History |
Technologies: Python, NLP, BERT, Deep learning, Huggingface Transformers |
Research Engineers: Parisa Zahedi |
The aim of this project is to investigatate the conceptual history of a certain topic in a collection of texts. To this end we work on Dutch language models for historical research. The current available language models for Dutch (at time of writing) fall short for historical research, because they are trained only on recent data. The performance of models declines steadily when applied to data that lies outside of the distribution of the training on corpora. |
MetaSynth |
Time frame: 04/2022 - present |
Research domain: Department of Methodology & Statistics |
Technologies: Python, Machine Learning, Synthetic data, Privacy |
Research Engineers: Raoul Schram |
Privacy and proper disclosure control is a hot topic at the moment. This project aims to create a standard to share statistical information and generate synthetic data. MetaSynth fits a distribution to each of the variables, while it also generates a synthetic dataset from this information. In between, there is a generative metadata file that contains the condensed information as a human-readable JSON file. |
Breakthrough patents |
Time frame: 06/2022 - present |
Research domain: Faculty of Law, Economics and Governance |
Technologies: Python, Machine Learning, Natural Language Processing, BERT, TfIdf |
Research Engineers: Jelle Treep, Raoul Schram, Shiva Nadi, Maarten Schermer |
We use multiple Natural Language Processing ML methods to predict whether a particular patent is a breakthrough innovation or not. This project includes work to simplify the whole pipeline from reading the patents, preprocessing, prediction and analysis. |
TORS |
Time frame: 10/2021 - present |
Research domain: Faculty of Science, Information and Computing Sciences |
Technologies: Python, Django, Docker, C++ |
Research Engineers: Roel Brouwer, Haili Hu |
In this project, we define the requirements to set up a scientific challenge in the domain of Train Unit Shunting and Servicing, using the TORS simulator developed by Utrecht University and TUDelft. In between transportation services, trains are parked and serviced at shunting yards. The objective of the challenge is to test the robustness and flexibility of scheduling algorithms, that position all the trains at the right position and get the maintenance tasks done in limited run time. |
Precision Nudging |
Time frame: 04/2021 - present |
Research domain: Faculty of Law, Economics and Governance, Public Governance and Management |
Technologies: Python, machine learning, regression analysis, synthetic data |
Research Engineers: Haili Hu, Raoul Schram |
Changing behavior is necessary to tackle societal problems, such as obesity and financial problems. One way to change behavior is by nudging people. A nudge is a way to change behavior without prohibiting options or changing its costs. However, nudges are often one-size-fits-all techniques: everyone is offered the same nudge. The scientific aim of this project is to use open data to develop predictive models with Machine Learning, in order to determine the most effective nudge for persons, given the nudging goal and the individual personal circumstances. To test our models, we created realistic synthetic data. |
AnonymoUUs / AnonymizeYouth |
Time frame: 10/2021 - present |
Research domain: Faculty of Social Science |
Technologies: Python, de-identification |
Research Engineers: Maarten Schermer, Casper Kaandorp, Martine de Vos |
Researchers often use personal data in their research. According to the GDPR these data need to be de-identified. We have developed tools to de-identify textual data. For the specific use case of the YOUth cohort studies, additional tools have been developed to prepare the YOUth data for correct de-identification. |
tweet_collector |
Time frame: 2021 - present |
Research domain: Faculty of Humanities, Media and Culture Studies |
Technologies: Python, searchtweets, elasticsearch/kibana |
Research Engineers: Parisa Zahedi, Roos Voorvaart |
Twitter forms a rich source of information for researchers interested in studying ‘the public conversation’. The Academic Research product track is designed to serve the needs of the academic research community. It provides researchers with special levels of access to public Twitter data without any cost. This project is aimed to help researchers to use the Academic Research product track to collect tweets of their interest and analyze them. |
Completed projects
Streetview segmentation |
Time frame: 06/2022 - 02/2023 |
Research domain: Faculty of Veterinary Medicine |
Technologies: Python, Machine Learning, Computer Vision, Visual Transformers, Docker |
Research Engineers: Raoul Schram, Maarten Schermer |
We are using recent computer vision models based on Visual Transformers for semantic segmentation, assigning semantic classes such as “street”, “vegetation” and “water” to the pixels of photos taken in the urban environment. This will be used for further research into the effect of the presence of water bodies and vegetation on human well-being. Follow-up to the Streetview greenery project. |
SummerFAIR |
Time frame: 02/2021 - 02/2023 |
Research domain: Faculty of Veterinary Medicine, Veterinary Epidemiology |
Technologies: Semantic Web, Ontologies, RDF, Docker, Python, R |
Research Engineers: Martine de Vos,Christine Staiger |
The summerFAIR project aims to integrate existing data sets on transmission experiments to enable reanalysis and meta-analysis. We have developed a pipeline based to map data to a shared vocabulary, convert them to linked data triples and perform integrated analyses. |
Crunchbase |
Time frame: 05/2020 - 01/2023 |
Research domain: Faculty of Geosciences, Dynamics of Innovation Systems |
Technologies: Webscraping, Internet Archive, Pipeline, Internet, Python, Kinesis Firehose, AWS, Terraform |
Research Engneers: Maarten Schermer, Casper Kaandorp, Martine de Vos |
The Crunchbase project assesses the sustainability of European startup-companies by analyzing their websites. As the researcher is interested in the current, as well as the pre-Corona situation, we scrape webpages from the Internet Archive. Together with SURF we have developed a pipeline for collecting and analyzing these webpages, using AWS as infrastructure. |
Data donation - WhatsApp |
Time frame: 02/2022 - 11/2022 |
Research domain: ODISSEI Social Data Science Team (SoDa) |
Technologies: Python, Privacy, Pyodide |
Research Engineers: Parisa Zahedi, Shiva Nadi |
As an extension to Data-Donation project, we developed scripts to extract information from WhatsApp data download packages. In this study respondents can voluntary donate their group chats and/or their account information files through an online platform (PORT). |
Coast Snap |
Time frame: 10/2021 - 06/2022 |
Research domain: Faculty of Geosciences |
Technologies: Python, Elixir, Databases, Web development |
Research Engineers: Casper Kaandorp |
For this citizen science project, we created a web application for collecting and processing pictures of the shoreline taken by citizens at different locations in the Netherlands |
Dynamiek in beeld |
Time frame: 2021 - May 2022 |
Research domain: ODISSEI Social Data Science Team (SoDa) |
Technologies: Shiny, R |
Research Engineers: Parisa Zahedi, Shiva Nadi |
An application that can be used in a clinical setting to score dynamics in empathy. There are some questions to be asked whereafter the results are visualized. The visualization should help the clinician to ask the right questions immediately. |
Deviance in Art |
Time frame: 02/2022 - 04/2022 |
Research domain: Faculty of Social and Behavioural Sciences |
Technologies: Python, GoogleArts, WikiArts, APIs |
Research Engineers: Raoul Schram |
Create a webscraper that can retrieve metadata and artworks from the GoogleArts and WikiArts websites. The deliverable is a generic Python package. This package will be used for multiple research questions involving machine learning. |
Ocean Parcels Numba |
Time frame: 07/2021 - 02/2022 |
Research domain: Faculty of Science, Physical Oceanography |
Technologies: Python, Numba |
Research Engineers: Roel Brouwer, Raoul Schram |
Investigating the feasibility of speeding up existing Python code for Parcels using Numba. The aim is to speed up the simulation enough to eliminate the need for a separate (partial) JIT/C path in the code. This should lead to a more flexible and maintainable code base. |
Data Donation - proof of concept |
Time frame: 04/2021 - 10/2021 |
Research domain: Faculty of Social Sciences, Human Data Science group |
Technologies: Python, WebAssembly, Pyodide, synthetic data |
Research Engineers: Haili Hu, Roos Voorvaart |
In this project, we collaborated with the Human Data Science group and Eyra to make data from social media platforms easily accessible to researchers, while preserving privacy. Respondents can voluntary donate their data download packages through an online platform (PORT), and researchers can provide custom data extraction scripts, which will be run locally on the respondent’s devices. A proof-of-concept PORT has been developed by Eyra, while data extraction scripts and synthetic data packages were provided by the Research Engineering team. |
Ocean Parcels particle-particle interaction |
Time frame: 07/2020 - 07/2021 |
Research domain: Faculty of Science, Physical Oceanography |
Technologies: Python, data structures, simulation |
Research Engineers: Roel Brouwer, Raoul Schram |
Providing a working implementation of particle-particle interaction for Parcels. The aim was to allow simulated particles to interact and influence each others states. This project involved reviewing and partially restructuring the data structures that Parcels uses for storing particle data, and implementing particle-particle interaction under certain conditions. |
Large scale network experiments |
Time frame: 2020 - 2021 |
Research domain: Faculty of Social Sciences |
Technologies: Python, Elixir, Databases, MTurk |
Research Engineers: Casper Kaandorp |
For this sociology project, we recruited people via Amazon Mechanical Turk and had them play a networking game |
Network Entropy |
Time frame: 2020 - 2021 |
Research domain: Faculty of Science, Information and Computing Sciences |
Technologies: Temporal networks, Python, Numba, simulation |
Research Engineers: Raoul Schram |
To improve the theoretical analysis and comparison of different temporal networks, we have invented a new metric to study them. The measure is called network entropy, and is applicable to any temporal network. We showed with simulations that processes on a network behave very differently, depending on the network entropy. |
hist-aware |
Time frame: 2020 - 2021 |
Research domain: Faculty of Humanities, History and Art History |
Technologies: Python, NLP, Deep learning, Huggingface Transformers |
Research Engineers: Leonardo Vida, Parisa Zahedi |
This project makes use of the Delpher archive (delpher.nl/kranten), which is the largest public collection of digitized pages from Dutch historical newspapers. The research team is mining articles’ sentiments, as expressed by the author of the articles, extracting all the relevant Delpher articles around specific topics (i.e. energy) and is currently training natural language processing (NLP) models called Transformers to extract a sufficiently accurate representation of the sentiment of each article. Currently, the team is making use of the period 1960-1995 consisting of around 250.000 articles around the topics chosen. |
Protosc |
Time frame: 2020 - 2021 |
Research domain: Faculty of Social Sciences |
Technologies: Feature selection, Python, image classification, wrapper, filter, genetic algorithm |
Research Engineers: Raoul Schram, Roos Voorvaart |
Protosc is a Python library that aims to determine which features are relevant to a given classification problem. It does so by using wrapper/filter/genetic algorithms, after which automatic statistical analysis is used to determine which features are significant. The package also includes a few different options for an image classification pipeline. |
Animal Sounds |
Time frame: 2019 - 2021 |
Research domain: Faculty of Science, Ecology and Biodiversity Group |
Technologies: Python, bioacoustics, audio, machine learning, deep learning, feature engineering |
Research Engineers: Jelle Treep, Parisa Zahedi, Casper Kaandorp |
We developed algorithms and a data processing workflow to detect vocalizations of Chimpanzees in a large body of audio data from the African tropical rainforest. The workflow consists of: 1) a filtering step where irrelevant audio data is removed to speed up manual annotation, 2) a feature engineering and feature selection step, and 3) classification using support vector machines and convolutional neural networks. |
Porpoise Reproduction |
Time frame: 2020 |
Research domain: Faculty of Veterinary Medicine, Pathology |
Technologies: R, sf, raster, rgdal |
Research Engineers: Jelle Treep, Roos Voorvaart |
Porpoise Reproduction studies how reproduction rates of harbour porpoises are affected by various factors. In this RSE project marine regions were enriched with Cumulative Human Impact model data |
Streetview greenery |
Time frame: 2019 - 2021 |
Research domain: Faculty of Geosciences |
Technologies: Machine learning, Python, image segmentation, deeplab, kriging, geolocation, CityScapes |
Research Engineers: Raoul Schram |
For the streetview project we have used the (formerly) open street view data from the municipality of Amsterdam to create a map of the greenness. This is done by taking the images and segmenting each image into different classes. The number of pixels in each image belonging to the “greenery” class is used to create the Amsterdam greenery map. |
ASReview |
Time frame: 2018 - 2021 |
Research domain: Faculty of Social Sciences |
Technologies: Machine learning, active learning, Python, Flask, hyperparameter optimization, simulation |
Research Engineers: Raoul Schram, Parisa Zahedi, Jonathan de Bruin |
ASReview is a machine learning tool to aid researchers in performing systematic reviews. It uses active learning to present users with more likely relevant papers. It has been written in Python 3.7+, and hyper parameters have been optimized using the hyperopt package. We have also contributed to the initial back-end for the user interface using Flask. |
Agri-activism |
Time frame: 2019 - 2020 |
Research domain: Faculty of Social Sciences |
Technologies: Python, NLP, topic-modeling, network-analysis |
Research Engineers: Parisa Zahedi, Martine de Vos |
This project aims to explore the potential of Twitter data in making sense of online debates. More specifically, the focus is on the online manifestation of the anti-Monsanto movement and assess tweets to investigate (1) activists’ behaviors and opinions, (2) the shape of their networks, (3) their organization and leadership, and (4) information diffusion patterns. Monsanto is one of the world’s largest producers of both agrochemicals and genetically modified crop seeds. |
Global Goals |
Time frame: 10/2019 - 08/2020 |
Research domain: Faculty of Geosciences, Global Sustainability Governance |
Technologies: Webscraping, Python, AWS, Terraform |
Research Engineers: Jelle Treep, Martine de Vos |
The Global Goals project investigates the effect of the United Nations’ Sustainable Development Goals (SDGs) on the global network of intergovernmental organizations. This network is represented by the hyperlinks on the organizations’ websites. We have retrieved the historical - from 2012 up to 2019- hyperlinks for a given set of international organizations via the Internet Archive. |