Open Data Science Europe Workshop 2021

Leandro Parente


Sessions

09-10
10:00
20min
Spatiotemporal modeling of environmental dynamics at global scale: building open multiscale data cubes
Tom Hengl, Leandro Parente

A global compilation of monthly and annual time-series of images for the periods 1982-2018 and 2000-2020 (data cube) is described. The prepared time-series for 1982-2018 (global at 5-km resolution) comprise: TerraClimate (Abatzoglou et al., 2018), vegetation monthly NDVI 90% percentiles for period 1982--2018 as a merge of the AVHRR daily and MODIS NDVI product, Vegetation Continuous Fields (VCF5KYR) Version 1 dataset (Song et al., 2018), Hyde v3.2 land use annual time-series (Klein Goldewijk et al., 2017), . For period 2000-2020 (global at 1-km resolution) MODIS land products (NDVI, LST, snow cover) in combination with MODIS atmospheric products (water vapour, cloud fraction), and global relief (MERIT DEM) and climate layers (CHELSA) are used. All layers have been resampled and gap-filled so they can be imported as an Analysis-Ready spatiotemporal array. For each pixel we also provide geometric temperatures (derived from latitude, day of the year and elevation) and for many layers also uncertainty measures. These datastacks have been made available via our OpenLandMap.org data portal and Cloud-Optimized GeoTIFF S3 file service and available for research and development. Overlaying Earth System Science point datasets (https://gitlab.com/openlandmap/compiled-ess-point-data-sets) such as the global compilation of soil organic carbon demonstrates that the global data cubes can be used to build complex spatiotemporal 2D+T models, including 3D+T, and produce predictions of important variables representing our dynamic environment. The two important advantages of running machine learning on spatiotemporal data recognized include: (1) possibility to explain complex casual relationships between environmental dynamics of plants, ecosystems communities, and soil variables and dynamic climate and human influence, (2) possibility to predict states beyond the time-span covered by training data - e.g. to predict future (as in scenario testing) and past states for which there are no training points.

General
HUGOTech
09-07
13:30
90min
Working with Cloud-Optimized GeoTIFFs in Python
Leandro Parente, Martin Landa, Tomaš Bouček

Software requirements: opengeohub/py-geo docker image (gdal, rasterio, eumap)
Content:
Why Cloud Optimized GeoTIFF?
Generating COG files using GDAL
Providing COG files through S3 protocol
Accessing remote COG files in Python
QGIS Eumap Plugin

workshop
Succes Avenue (former Kleine veer zaal)
09-06
15:30
90min
Spatiotemporal machine learning in Python (Part 2)
Chris van Diemen, Leandro Parente

Software requirements: opengeohub/py-geo docker image (gdal, rasterio, eumap, scikit-learn)
Content:
Theoretical background for Ensemble ML and python implementations
General concepts and main advantages of spatiotemporal machine learning
Why use LandMapper?
Spacetime overlay to prepare the training samples
Spacetime cross-validation to evaluate the EML model performance
Hyperparameter optimization to tune the EML model
Fitting the final EML model
Generating spatial predictions using the fitted model

workshop
Succes Avenue (former Kleine veer zaal)
09-07
11:00
90min
High performance computing in Python
Leandro Parente

Software requirements: opengeohub/py-geo docker image (gdal, rasterio, eumap, scikit-learn)
Content:
What are the possibilities to improve the performance of computation in Python?
Performing Numpy operations using multicore processing
Accelerate python functions using Numba
Fast numerical expression using NumExpr
Using the TilingProcessing to distribute raster operations in multiple cores

workshop
HUGOTech
09-06
13:30
90min
Spatiotemporal machine learning in Python (Part 1)
Chris van Diemen, Leandro Parente

Software requirements: opengeohub/py-geo docker image (gdal, rasterio, eumap, scikit-learn)
Content:
Theoretical background for machine learning and python implementations
Integrating raster data with scikit-learn models
Why use pyeumap.LandMapper?
Spatial overlay to prepare the training samples
Spatial cross-validation to evaluate the ML model performance
Hyperparameter optimization to tune the ML model
Fitting the final ML model
Generating spatial predictions using the fitted model

workshop
Succes Avenue (former Kleine veer zaal)
09-06
09:00
90min
Introduction to ODSE datasets in Python
Leandro Parente

The first 30 minutes will be dedicated to Software/libraries preparations and user support
Software requirements: Python, Jupyter, QGIS, GRASS GIS, R
Content:
General concepts and main advantages of docker containers
What is Docker image and where to find it?
Starting with the docker image opengeohub/py-geo.g
Which tag/version should I use?
Install new OS and python packages inside the container
Share files between the host machine and the container
OSGeo live ready to use in the VirtualBox
Supporting time to help with software and libraries preparations

The next 60 minutes will be dedicated to the introduction to spatial and spatiotemporal data in Pythin
Software requirements: opengeohub/py-geo docker image (gdal, rasterio, geopandas, eumap)
Content:
Introduction to JupytetLab and python libraries
Introduction to spatiotemporal datasets,
Theoretical background for spatial and spatiotemporal machine learning,
Eumap library (gapfilling and mapper modules)
Eumap spatiotemporal datasets example: landcover 2000-2020 training dataset (Witjes et al, 2021)

workshop
Succes Avenue (former Kleine veer zaal)