Open Data Science Europe Workshop 2021

Martijn Witjes


Sessions

09-09
09:40
20min
Land cover time-series data stack for Europe 2000--2019 based on LUCAS, GLAD Landsat and Spatiotemporal Ensemble Machine Learning
Martijn Witjes

We classified 33 land use / land cover (LULC) classes between 2000 and 2019 using a single spatiotemporal ensemble machine learning model in a fully automated, free and open source workflow. This workflow includes harmonization and preprocessing of several high-resolution publically available covariate datasets and over five million training samples, spatial K-fold cross-validation, hyperparameter optimization, and multiple methods for LULC change analysis. We show how the per-class probability predictions (1) facilitate useful prediction uncertainty metrics, (2) inform use case-tailored post-processing strategies, and (3) enable a novel way to quantify LULC change dynamics without relying on hard-class predictions. We show that for this purpose, spatial models that are trained on data from a single year are consistently outperformed by a single spatiotemporal model that is trained on all data from all years, especially when generalizing to input data from years that are not included in the training dataset. We present a final land cover dataset with per-class probability and uncertainty metrics, as well as a hard-class classifications with 62\% cross-validation (CV) accuracy for 33 Corine Land Cover (CLC) level 3 classes, 70\% accuracy for 14 level 2 CLC classes, and 87\% accuracy for the 5 level 1 classes. Our results suggest that our method enables land cover classification for subsequent years without waiting for new training data, while facilitating improved training data collection through analysing variable importance, per-class performance, and uncertainty metrics.

We propose that he future of land cover land use mapping and change detection will likely be driven by developments in the following fields: (1) multisource data harmonization, such as combining Sentinel and Landsat data, (2) leveraging the spatial context of remote sensing data by applying pattern recognition and object-based image analysis on spectral features, and (3) combining spatiotemporal ML with process-based techniques such as urban crawl and vegetation growth modeling.

General
HUGOTech