Open Data Science Europe Workshop 2021

Machine Learning for spatiotemporal data using mlr3
2021-09-10, 11:20–12:00, HUGOTech

The R package {mlr3} and its associated ecosystem of extension packages implements a powerful, object-oriented and extensible framework for machine learning (ML) in R.
It provides a unified interface to many learning algorithms available on CRAN, augmenting them with model-agnostic general-purpose functionality that is needed in every ML project, for example train-test-evaluation, resampling, preprocessing, hyperparameter tuning, nested resampling, and visualization of results from ML experiments.
The package is a complete reimplementation of the mlr (Bischl et al., 2016) package that leverages many years of experience and learned best practices to provide a state-of-the-art system that is powerful, flexible, extensible, and maintainable.
We target both practitioners who want to quickly apply ML algorithms to their problems and researchers who want to implement, benchmark, and compare their new methods in a structured environment.
{mlr3} is suitable for short scripts that test an idea, for complex multi-stage experiments with advanced functionality that use a broad range of ML functionality, as a foundation to implement new ML (meta-)algorithms (for example AutoML systems), and everything in between.
Functional correctness is ensured through extensive unit and integration tests.

This tutorial showcases how to use {mlr3} with spatiotemporal data for performance estimation and prediction, making use of the extension packages {mlr3spatiotempcv} and {mlr3raster}.
To enhance reproducibility, we will show how to use the workflow package {targets} (successor of {drake}) for this purpose.


Please, insert here all the other authors of your submission, together with their affiliated institution.

n/a

R consultant at cynkra in Zurich, CH.

I have advanced knowledge in the area of applied machine learning, more specifically in the field of environmental modeling. I am also familiar in DevOps (Docker, Terraform, Ansible) and Linux administration tasks. Besides, I am undertaking a PhD at the GIScience group at the Department of Geography at University of Jena related to environmental modeling.

In my spare time I like to develop programmatic solutions to simplify todays data science challenges. My standard is to work in a reproducible manner which I also actively promote in my daily work. Occasionally I am blogging about IT related matters I am interested in.